Smita Desai asked:
>
> According to Microsoft KnowledgeBase article Q170559, there are 398
> characters that do inaccurate round trip mapping, between SJIS and
> Unicode. Some of the examples are as follows:
> Code page 932
>
> 0x879c --> Ux222a --> 0x81be
> 0xed40 --> Ux7e8a --> 0xfa5c
> 0xed41 --> Ux819c --> 0xfa5d
>
> According to that article, these are duplicates that do not round trip
> map and were added for NEC needs.
Yes, there are a large number of duplicated characters in Microsoft Code Page
932. These is even one character that is triplicated. These create a
roundtrip problem for any straightforward conversion scheme.
>
> Does anyone have any background info? Is the only solution to create a
> table in the code, which would have a bad performance hit? If these
> are duplicates, then does it matter that they do not round trip map?
>
I don't understand your comment about creating a "table in the code".
You have to have a table to do the conversion at all.
If you *must* have roundtrip conversion fidelity for Code Page 932
data, then what you have to do is tag your converted data with the
source code point, or use user-defined code points for the duplicated
characters. In either case, the code handling the Unicode-converted
data would need to know how to handle those extensions.
For most purposes, I doubt that roundtripping these particular
characters matters, but it would depend on the application.
--Ken Whistler
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT