Re: Making use of UTF-16 area for CJK

From: Martin J Duerst (mduerst@ifi.unizh.ch)
Date: Fri Aug 16 1996 - 10:05:44 EDT


Jake Morrison wrote:

>Martin,
>
>The situation that prompted me to write was this:
>
>The Taiwan government is currently planning an electronic
>document management system to eventually cover all paperwork.
>They are doing the right thing, and want to use open standards,
>SGML and Unicode. Unfortunately, Unicode doesn't cover their
>character requirements by a long shot.
>
>No one wants to write new software for PCs using CNS 11643 and
>its nasty three-byte encoding. They would rather go with a fixed-width
>character set, one that allows easy interchang. One idea is to add the
>whole of CNS 11643 to the as yet unused part of UCS-4.

>(Much better
>than their last idea to overlay the Korean section in UCS-16.)

I wouldn't mind personally if they had done that, because I currently
convert my system to only use Hangul Jamo internally. But from a
global viewpoint, that would definitely be a very bad idea.

>A similar
>solution may be necessary for Japan and the PRC.

For the Taiwanese government, the main problem is names, and
this is a much smaller problem for Japan, and probably almost
nonexistent for China. Actually, the Taiwanese government is
partly responsible by itself for the problem with names; that
clerks that register names don't check them against some
list of existing characters, and at least ask parents "do you
really mean that to be a new character, didn't you just want
it to be XXX?", and that obvious mistakes cannot be corrected
as was explained in a previous mail, is definitely things that
Taiwan could do better even if they don't want to interfere
with the right of people to make new characters for their
children's names. Also, the government officials could
maybe even carefully discurage the composition of new name
characters. Most countries have limitations; the French have
been know to be particularly strict, and here in Switzerland,
officials are required to check names and ask parents for
proof that this name is not made up. Officials also have
to protect the child from parents giving fancy names that
will not be very helpful in later life.

>(Talk about filling up
>the private use area :-).

There should be quite enough of a private area already in UTF-16,
and of course there is enough in UCS-4.

>I am all in favor of having a nice, sane, regular character set
>with no duplication. It seems that this will take quite a long
>time though (10 to 15 years?) at the current rate. This is completely
>understandable due to the painstaking work involved, and at a certain
>point character unification becomes impossible due to differing opinions.

The timescale is definitely quite shorter. For the IRG meeting this
year in Hong Kong, countries were asked to send enough delegates to
have one for each subcommitee, and there was a whole week of
serious work (as far as I was able to guess from the messages
I received).

>What to do in the mean time?
>
>The Vietnamese have already added characters to their version of
>ISO 10646. Are each of the CJK countries going to do the same with their
>versions of the standard, without any coordination? Is there some
>middle ground, where the IRG continues its work, while character blocks
>are added (under the control of the Unicode consortium?) to allow
>use by Asian governments?

There is definitely nothing like a "Vietnamese version of ISO 10646".
Apart from the updates through PDAMs, there is only one ISO 10646.
Vietnam put their characters into the private zone for the moment
being.

>This is a practical question for us. Our X.500 directory supports Unicode,
>but it is still not enough. Maybe this isn't a problem that Unicode can
>solve.

No character standard can solve the problem that new characters get
created.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT