From: Keutgen, Walter (walter.keutgen@be.unisys.com)
Date: Fri May 12 2006 - 12:51:22 CDT
Doug,
your excellent conclusion lets aside the autodetection.
Philippe Verdi wrote:
> Doesn't it break or severely limits the encoding autodetection in IE?
> This may explain why IE so often displays Chinese characters in the middle
> of a French webpage hosted on a server that simply does not specify its
> actual encoding: IE returns a false positive match with UTF-8, instead
> of identifying the ISO-8859-1 encoding that was actually used.
>
> This is a severe and very ennoying bug for users (like French users trying
> to read webpages that were encoded as ISO-8859-1 but interpreted by default
> as UTF-8 as if it was Chinese, even though it would be invalid UTF-8).
Microsoft should leave the ill formed UTF-8 sequences aside for the determination of the coded character set.
Or alternatively, would it not be simpler to stick to the standards and choose ISO-8859-1 when the HTML source does not provide any charset. More philosophically, is it really better to try making it better than the standards?
The reader can still correct by chosing the appropriate encoding. Then Microsoft could satisfy everybody by offering 'UTF-8 strict' and 'UTF-8 liberal' or better, if the UTF-8 stream contains ill formed sequences, offering the user to accept them by a pop-up dialogue.
Best regards
Walter Keutgen
Unisys Belgium
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Doug Ewell
Sent: 12 May 2006 17:45
To: Unicode Mailing List
Subject: Re: Win IE 7b2 and UTF-8
Through the years, Microsoft and especially IE have taken a great deal
of criticismfor being either too liberal or too consenvative (or both)
in what they accept. Whichever they choose, there is sure to be someone
waiting in the wings to lambast them for it.
IMHO, what Microsoft should do with regard to decoding invalid UTF-8
sequences is make a decision, one way or the other, and document that
decision openly. That way the debate, and there is sure to be one, will
have to focus on the policy and not whether the software is "buggy."
My personal preference (RFC 793 notwithstanding) would be for IE to
decline to interpret invalid UTF-8, since that is the more secure
approach. As Philippe himself pointed out, there's probably not much of
this type of data out there. But it is their call.
-- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Fri May 12 2006 - 12:54:28 CDT