Re: Unicode CJK Language Myth

From: Martin J Duerst (mduerst@ifi.unizh.ch)
Date: Tue May 28 1996 - 08:27:38 EDT


Ken'ichi Handa writes:

>cherlin@snowcrest.net writes:

>> Anybody using Chinese and Japanese fonts together might see either the
>> Chinese or Japanese rendering. Presumably someone who uses Chinese fonts
>> has a specific reason for doing so--perhaps the person is Chinese, or is a
>> scholar of Chinese, or is doing business in China--but whatever the reason,
>> that person has to accept that Japanese and Chinese fonts are different,
>> and learn the significant differences.
>
>Why should I write again and again that the difference is beyond what
>is allowed as font variations?

This is not true. JIS 208, the basic Japanese standard, for example,
does not disallow such a font variation in any way. The new edition,
JIS X 0208-1996 (or more probably 1997), is even more specific
about this.
This is good because we don't want to artificially limit the creativity
of font designers.

>> So who is left, who could have a valid objection to unifying the glyphs
>> into one character?
>
>I have no idea why mine can't be recognized as a valid objection.

The problem is that your objection is not based on actual in-use
observation, but only on theoretical what-would-happen-if
argumentation, and that you have seen the two variants side-by-side
in a standard before seeing them in actual use. Assume a newspaper
would print the "wrong" glyph variant in one of their articles. How
many people would recognize the difference? How many people would
have difficulties understanding the text? How many people would bother
writing a letter to the editor, or mentionning that character to a
friend? I am sure all these figures would be very low, unless people
already know about these shape differences and maybe the connection
of them to Unicode. It might then be that there is a huge (but
unsubstantiated) outcry about Chinese or US "cultural invasion".
But if the test is made with people that don't know about these
things, very few will notice, and even fewer will bother.

>> Since no Japanese or Chinese character set standard distinguishes these two
>> glyphs, the conclusion is that nobody feels the need to make the
>> distinction at the character level.
>
>You are saying something like that since ASCII does not contain some
>greek character, ASCII does not distinguish the character `a' from the
>greek character.
>
>It's nonsense to say some character set distinguishing or not two
>characters if one is not included in the set. If we dare to say
>something, a character set distinguishes characters contained in the
>set from all characters not contained in the set.
>
>And, no Japanese character set contain a character which allows
>Chinese `choku' variant. In this sence, a character which allows
>Chinese `choku' variant is different from the Japanese character which
>doesn't allow the variant.

This is not true. See above.
And while for Greek characters, the Greeks definitely think that
an 'a' and an 'alpha' are different, for the character "choku" the
Chinese think that both variants are the same character, and the
Japanese would do so too if they knew the Chinese variant.
There is no way to see the Chinese variant as a different character,
only as a variant that might be more or less known, and more
or less accepted in some typographic situations.

>> How do the objectors to Unicode handle the problem of rendering "choku"
>> now? Either it doesn't present a real problem, or they use separate
>> Japanese and Chinese fonts with incompatible codings. Is this an advantage
>> to anyone?
>
>Very simple. Just use two character set Japanese JISX0208 and Chinese
>GB2312 (or/and CNS11643) concurrently. There exist no incompatibility
>as far as we use internationalized encoding methods (ISO-2022-INT and
>X's Compound Text are the examples) and internationalized internal
>character representation (Mule's method and X.V11R5's Xsi method are
>the examples).

The main problem here is that this makes it difficult to find the same
character in texts in different languages. That is an important aspect
of multilingual computing, but is highly impossible with a proliferation
of national character sets. Implementers in multilingual information
retrieval are very happy users of Unicode.

>> Have I missed something? Is this style of explanation satisfactory to
>> Japanese computer users?
>
>You missed too many things to satisfy us.

For a Japanese computer user, working applications are the best
argument. And I am sure they will appear in the near future, maybe
even without the users noticing. If you do it right, there is nothing
that could be noticed :-).

>We (at leat mule) have not technical difficutly for handling multiple
>double-byte character sets with more-than-16-bit charcater code
>internally.

True, you invested a lot of work into these things. Did you ever
count how much this was? And what other nice things you
could have implemented in that time, e.g. proportional rendering,
real language tagging useful for any languages, and so on?
I don't want to criticise you too much because at the time you started
mule, Unicode was not yet available. But this does not mean that
you should not try to see things from a neutral point of view.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT