From: Markus Scherer (markus.icu@gmail.com)
Date: Tue Oct 04 2005 - 11:46:27 CST
The ibm-943_P15A table in ICU's default data differs slightly from the
Windows 932 conversion behavior. I seem to remember that some 7
roundtrip mappings are different (out of more than 8000).
In particular, IBM's Unicode conversion tables for "PC codepages"
rotate some control codes as you showed. (Trying to map the 0x1A
ctrl-Z to U+001C because DOS/Windows interpret 0x1A ctrl-Z as
end-of-text-file.) These control characters are rarely used, so this
difference is usually benign.
The ICU collection of conversion tables includes "true" Windows
tables. You can add them to your ICU distribution, or use them with
ICU from your application data. See
http://icu.sourceforge.net/charts/charset/
(Some of these are part of ICU's default data; we included those that
did not have close matches among IBM tables.)
markus
On 10/3/05, Tim Greenwood <timothy.greenwood@gmail.com> wrote:
> Is the data on
> http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
> - giving the Unicode map for code page 932 (SJIS) still accurate? This chart
> from1998...
The one on http://icu.sourceforge.net/charts/charset/ should match
Windows XP conversion.
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Tue Oct 04 2005 - 11:48:31 CST