Re: Microsoft input method, 950, and Unicode mapping

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Dec 18 2001 - 19:59:32 EST


Tex,

>
> Thanks for this and the several private responses.
>
> For anyone interested, in addition to the Microsoft page:
> http://www.microsoft.com/hk/hkscs/
>
> The HK Gov't has a web page, fonts and mapping tables:
> http://www.info.gov.hk/digital21/eng/hkscs/introduction.html

And to add to the chaos and confusion, note that the HKSCS
patch for Windows Code Page 950 does not map exactly the
same as the HK Government mapping table. And that the HK
Government mapping table has at least a couple of blatant
errors in it. And that the HKSCS path for Windows Code Page 950
(like Code Page 950 without the extension, but even moreso)
has duplicate mappings in it that need to be resolved in
order to roundtrip through Unicode. And you have no guarantee
that various vendors' attempts to sort out the HK Government
mapping table and Windows Code Page 950 + HKSCS path behavior
will themselves produce matching results.

>
> Oracle gave a nice paper at a recent Unicode conference:
> http://www.unicode.org/iuc/iuc18/papers/b19.ppt
>
> It amazes me that in the year 2000, organizations are still creating
> chaos by amending definitions of standards especially code pages,
> without giving the new creation its own name or some other way of
> distinguishing it, and then on top of that creating multiple mapping
> tables.
>
> I understand the desire to get new functionality into users hands, but
> would it have been a problem to rename either big5 or 950 to something
> like big-6 or big-5hk or 950HK or 951?

Sybase is now supporting "cp950" (+euro, by the way -- another addition
that may or may not be supported in a particular Windows implementation,
depending on date) and a separate "big5hk", so if you interoperate
with Sybase, you should know what you are getting. However, like
everybody else, it is hit or miss for us when a platform or other
data announces itself to us as "cp950" or "big-5", whether it
is with or without the HKSCS extensions.

> So now we can't tell if big-5 or 950 will or won't have this data, or
> even whether Unicode data will have these characters in the private use
> area or elsewhere, or whether software that may be on the other end of
> the pipe supports HKSCS or not, or even if their operating system has
> the patch or not.
>
> Although "that which we call a rose by any other name would smell as
> sweet",
> calling everything a rose, makes it hard to know when you are getting a
> rose.

I think this was all part of a conspiracy for Chinese to catch up
with Japanese, since the Chinese code pages (until now) didn't have
a mess the scale of SJIS. But between HKSCS and GB 18030, they are
making up for lost time.

--Ken

>
> Here's hoping for less chaos in 2002!
> tex



This archive was generated by hypermail 2.1.2 : Tue Dec 18 2001 - 19:45:27 EST