Re: Frequent incorrect guesses by the charset autodetection in IE7

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Jul 19 2006 - 09:00:18 CDT

  • Next message: Philippe Verdy: "Re: Frequent incorrect guesses by the charset autodetection in IE7"

    Samuel Thibault <samuel dot thibault at labri dot fr> wrote:

    >> Microsoft does support ISO-8859-x in its word processors, using the
    >> support from the OS. Also indirectly through the Windows codepages
    >> 125x which are extensions of ISO-8859-x where C1 controls have been
    >> replaced by other characters (for example Windows 1252 supports
    >> ISO-8859-1 except C1 controls,
    >
    > Yes, and that's where they begin to do funny things like inserting
    > CP1252's single quotation mark in a text and then claiming that this
    > is ISO-8859-1... (just the same for the Euro symbol etc).

    Facts:

    1. Windows code page 1252 is a variant of ISO 8859-1 that replaces most
    of the C1 control characters (from 0x80 through 0x9F), which are used by
    an exceptionally small number of programs and processes, with graphic
    characters such as the euro symbol and curly quotes. For all practical
    purposes it can be considered a superset of ISO 8859-1.

    2. Some Microsoft programs, such as Word, replace the "straight" ASCII
    apostrophe and quotation marks typed by the user with the "smart"
    apostrophe and quotation marks from the extended, non-ASCII, non-8859-1
    range. Most of the time, this seems to be what users want for "good
    typography."

    3. If these characters are saved as UTF-8 or as Windows-1252, and
    labeled as such, everything is OK.

    4. Unfortunately, there are some versions of some programs (Outlook?
    Outlook Express?) that take Windows-1252 text and label it as ISO 8859-1
    for interchange. If the text contains curly quotes, euro symbols, or
    other "extended" characters from the C1 range, this is incorrect and
    will display badly. I don't know if current software versions still do
    this.

    5. None of this is new information, and Microsoft is aware of all of
    it. And none of it implies that there is anything wrong, or evil, or
    "illegal" in creating and using an 8-bit character set that is spun off
    from an ISO 8859 character set. The problem is in the mislabeling.

    --
    Doug Ewell
    Fullerton, California, USA
    http://users.adelphia.net/~dewell/
    


    This archive was generated by hypermail 2.1.5 : Wed Jul 19 2006 - 09:09:45 CDT