Re: Undefined code positions in 8-bit character sets

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 05 2008 - 12:35:40 CDT

  • Next message: Mark Davis: "FYI: Google posting about U5.1"

    Andreas Prilop wrote on Monday, May 05, 2008 4:30 PM

    >I refer to
    > http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT
    > http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
    >
    > In ISO-8859-1, code position 0x90 is mapped to U+0090.
    > In Windows-1252, code position 0x90 is listed as "undefined".
    >
    > Why are they treated differently?
    > International Standard ISO/IEC 8859-1 does *not* define
    > code position 0x90. So it might also be listed as "undefined".

    0x90 is defined in the IANA version of ISO-8859-1, which calls up the
    description in RFC1345. In a web context, I believe the IANA definition
    should take precedence over ISO/IEC.

    On the other hand, Windows-1252 might be extended again and assign a meaning
    to 0x90, so it is probably better not to map any Unicode codepoint to that
    value.

    > Or, for purely practical reasons, 0x90 in Windows-1252 might
    > also be mapped to U+0090.

    Which is reported to be what Windows *currently* actually does.

    Richard.



    This archive was generated by hypermail 2.1.5 : Mon May 05 2008 - 12:39:29 CDT