RE: [nelocsig] Japanese wave character issue

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Feb 20 2002 - 14:23:16 EST


Yep, that's right. This is one of the notorious small list of
inconsistencies between various mappings of JIS X 0208:

Microsoft Code Page 932 mapping:

0x8160 0xFF5E #FULLWIDTH TILDE

Alternative JIS X 0208 Shift-JIS mapping (e.g. for the Mac):

0x8160 0x2141 0x301C # WAVE DASH

Actually, the Unicode Consortium does not take (as yet) a formal
position on which of these conversions is correct. Mapping tables
are simply supplied by various vendors, and there may be
inconsistencies in their interpretations of mappings.

My *personal* opinion is that Microsoft has it right, as SJIS
0x8160 is treated as a fullwidth tilde in Japan, and is
generally shown that way in widely available commercial fonts.

When databases are doing roundtrip conversions through Unicode,
they need to be aware of these exceptional cases in the conversions,
precisely to avoid the kind of data corruption you are encountering.
There is no simple, universal "fix" for this, since platforms
do the conversions that they do, and other applications need to
take into account the edge cases.

The UTC has suggested an approach of documenting all the known
issues, particularly for Shift-JIS mappings, the most problematical
of the lot, but as yet no particular progress has been made on
this suggestion.

--Ken

> The note below came through the NELOCSIG list, but I'm assuming someone
> on this list may be able to give Laura some suggestions.
>
> -----Original Message-----
> From: Nelson, Laura [mailto:lnelson@kenan.com]
> Sent: Wednesday, February 20, 2002 1:04 PM
> To: 'nelocsig@yahoogroups.com'
> Subject: [nelocsig] Japanese wave character issue
>
>
>
> We have a situation where an important character, the Japanese "wave
> character", is lost during transfers from various parts of our software.
> The root cause is that Windows uses a different encoding than does the
> rest of the world.
>
> Data is entered into our database by one program which uses the more
> standard conversion to UTF8, and then read by another program using the
> Windows version. It displays as garbage, because the wave character gets
> lost in the conversion.
>
> There are other potential conversion issues with the same character,
> because it is non-standard.
> Does anyone have any suggestions?
> The encodings in question are:
> U+FF5E used by Windows
> U+30-1C used by JIS X 0221, Unicode Consortium, Java (SJIS, EUCJIS, and
> JIS), and Mac.
> The SHIFT-JIS character is 0x8160
>



This archive was generated by hypermail 2.1.2 : Wed Feb 20 2002 - 14:05:06 EST