Re: Code Page 932 vendor mapping

From: Markus Scherer (markus.icu@gmail.com)
Date: Tue Oct 04 2005 - 11:46:27 CST

  • Next message: Markus Scherer: "Re: U+047C/U+047D CYRILLIC OMEGA WITH TITLO"

    The ibm-943_P15A table in ICU's default data differs slightly from the
    Windows 932 conversion behavior. I seem to remember that some 7
    roundtrip mappings are different (out of more than 8000).

    In particular, IBM's Unicode conversion tables for "PC codepages"
    rotate some control codes as you showed. (Trying to map the 0x1A
    ctrl-Z to U+001C because DOS/Windows interpret 0x1A ctrl-Z as
    end-of-text-file.) These control characters are rarely used, so this
    difference is usually benign.

    The ICU collection of conversion tables includes "true" Windows
    tables. You can add them to your ICU distribution, or use them with
    ICU from your application data. See
    http://icu.sourceforge.net/charts/charset/

    (Some of these are part of ICU's default data; we included those that
    did not have close matches among IBM tables.)

    markus

    On 10/3/05, Tim Greenwood <timothy.greenwood@gmail.com> wrote:
    > Is the data on
    > http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
    > - giving the Unicode map for code page 932 (SJIS) still accurate? This chart
    > from1998...

    The one on http://icu.sourceforge.net/charts/charset/ should match
    Windows XP conversion.

    --
    Opinions expressed here may not reflect my company's positions unless
    otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Tue Oct 04 2005 - 11:48:31 CST