Re: Unused code positions and mapping to Unicode

From: Rick McGowan (rmcgowan@apple.com)
Date: Thu Aug 05 1999 - 20:05:27 EDT


Hi. I can't answer what others are doing, but...

> What is the proper mapping to Unicode of unused characters in a legacy encoding?

It is U+FFFD.

> - Nadine's book shows it mapped to U+FFFE.

That is just "wrong"; it should never be mapped to that for any purpose.

> - Java seems to map it it U+FFFD.

That's better, if it's really unused in the source.

> - The mapping tables on the FTP site have it listed as undefined and
> don't give a Unicode value.

I believe they will be updated to map control chars straight across, but the
CP mappings for Windows Code Pages are provided by Microsoft, so they will
do any updates, as required. In the case of 0x81 in a Windows Code Page, I
would probably myself map it to U+0081 -- you never know if it will be
mapped at a later time, or whether someone is using it for something devious
of their own devising.

Here in my world, we map unmappable stuff into Unicode at U+FFFD, and never
map back.

        Rick



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:50 EDT