Re: Win IE 7b2 and UTF-8

From: Mark Davis (mark.davis@icu-project.org)
Date: Sat May 13 2006 - 12:32:58 CDT

  • Next message: Doug Ewell: "Re: Win IE 7b2 and UTF-8"

    One option is to map any ill-formed UTF-8 sequence to a safe replacement,
    like U+FFFD. That prevents the non-shortest form sequences from causing
    security problems.

    Mark

    On 5/12/06, Doug Ewell <dewell@adelphia.net> wrote:
    >
    > Keutgen, Walter <walter dot keutgen at be dot unisys dot com> wrote:
    >
    > > Microsoft should leave the ill formed UTF-8 sequences aside for the
    > > determination of the coded character set.
    >
    > I agree that if encodings need to be autodetected, allowing invalid
    > UTF-8 to be handled as though it were valid UTF-8 hampers that effort.
    > It is a shame --but as Mark Davis said, probably a given -- that
    > autodetection is necessary at all.
    >
    > > Or alternatively, would it not be simpler to stick to the standards
    > > and choose ISO-8859-1 when the HTML source does not provide any
    > > charset.
    >
    > Actually, the code to do what IE does is of about equal complexity to
    > the code to interpret UTF-8 strictly. I doubt it had anything to do
    > with that.
    >
    > > More philosophically, is it really better to try making it better than
    > > the standards?
    >
    > I *strongly* doubt that Microsoft is trying to reinvent UTF-8. As I
    > said, they were probably trying to "be liberal in what they accept," and
    > not have people throw eggs at their windows because some badly encoded
    > Web page wouldn't display.
    >
    > > The reader can still correct by chosing the appropriate encoding.
    > > Then Microsoft could satisfy everybody by offering 'UTF-8 strict' and
    > > 'UTF-8 liberal' or better, if the UTF-8 stream contains ill formed
    > > sequences, offering the user to accept them by a pop-up dialogue.
    >
    > How many users who do not subscribe to the Unicode list would understand
    > how to use an option like that?
    >
    > --
    > Doug Ewell
    > Fullerton, California, USA
    > http://users.adelphia.net/~dewell/
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat May 13 2006 - 12:37:40 CDT