From: Doug Ewell (doug@ewellic.org)
Date: Mon Jun 28 2010 - 15:22:22 CDT
Mark Crispin <mrc plus unicode at panda dot com> wrote:
> On Mon, 28 Jun 2010, Mark Davis ☕ wrote:
>> The problem with slavishly following the charset parameter is that it
>> is often incorrect. However, the charset parameter is a signal into
>> the character detection module, so the charset is correctly supplied
>> from the message then the results of the detection will be weighted
>> that direction.
>
> I interpret these two sentences as:
>
> "The problem with following the standards is that some people don't
> follow the standards. So instead of following the standards
> ourselves, we will guess if the other guy follows the standards or
> not, no matter how much he claims to follow standards. Too bad if our
> fix transforms his valid data into garbage."
At the very least, it would be nice if the charset parameter constituted
a much stronger signal into the detection module than it apparently did
in Andreas' case, so that if he says the text is 8859-15, and we already
know that 8859-15 is nearly impossible to distinguish heuristically from
8859-1, the module might as well take his word for it.
I do tend to agree with Mark that the complaint against Google Groups
(with which I am not affiliated) might have been posted with more
civility and less invective.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
This archive was generated by hypermail 2.1.5 : Mon Jun 28 2010 - 15:26:04 CDT