Re: Unicode CJK Language Myth

From: Martin J Duerst (mduerst@ifi.unizh.ch)
Date: Tue May 28 1996 - 07:51:46 EDT


Ken'ichi Handa wrote:

>mduerst@ifi.unizh.ch writes:
>> otherwise, they feel okay with the rest. The problem is
>> that most of them don't realize that given all the different
>> requirements from all over the world, and in particular all
>> the different ways, in particular, of viewing and thinking
>> about Kanji, there are actually so few points in Unicode
>> any single person is not exactly happy about.
>
>I'm not claiming to distinguish ALL the different ways, but claiming
>that Unicode should have distinguished more reasonalble variants.
>Although the word "reasonalble" is vary vague, the current unfication
>of Unicode appears unreasonable to many Japanese. If we don't have to
>worry about the possibility of, for instance, character "choku" being
>shown in an unexpected way, more and more Japanese people accept
>Unicode.

If Han unification were presented in a little bit more reasonable
way by Japanese "experts" and in the press, acceptance for Unicode
in Japan would not be a problem.
Just simply tell the Japanese people that the Chinese variant of
"choku" is indeed used indiscriminately with the "Japanese" variant
in China, that this variant is similar to other simplifications also
occurring in Japanese, that this variant is not supposed to show
up on their screens anyway and otherwise they can blaim the
system setup or the software maker, and so on, if they want to
know the details. Many Japanese are interested in China, and
may be interested to learn such things. And the number of cases
that would need an explanation (if ever, because the possibility
to see such a character without having any knowledge about
Chinese is extremely low) is very few.
Also, the cases where Chinese might have to "endure" unfamilliar
glyph variants to allow the Japanese to use the characters and
glyph variants they are familliar with could be mentionned
to show the Japanese that Unicode is not partial to any of
the nations using CJKV ideographs.
This is much better than crying out "Unicode is wong. It will
take away our Japanese glyph shapes." as other anti-Unicode
exponents in Japan do, without any justification and usually
with twisting around quite a few facts.

>> Okay, I'll start again. If you look at what we might call "multilingual
>> typography", then you find mainly two cases, namely:
>
>> - The case where a few words from another language are incorporated
>> in the text of another language. In this cases, there is no font
>> change; glyph differences that may exist between those two
>> languages are eliminated in favor of glyphs in the base language.
>
>Why do you think only the glyphs in the base language is used in such
>cases? Typical cases are that Chinese person's name in Japanese
>context. If we have no economical reason of using only Japanese
>glyphs, we will use correct Chinese glyphs for Chinese names.

What do you mean by economical reason? Do you mean that
big newspaper and magazine printing companies would not
be able to afford Chinese variant glyphs for these cases?

The problem here is primary an *aestetic* problem. Mixing Chinese-
style glyphs into a Japanese text looks ugly. You may not think
so because you are too much fixed on code sets and environments
where font combinations are very restricted. But designers and
typographers know by their experience, and that's why these
texts are printed the way they are. If there is any economical
argument involved, then it is that it is less economic to
render these characters with Chinese glyphs because it
diminishes readability due to their uglyness in the middle
of a Japanese text.

>> - The case where there is an abrupt change between different languages,
>> such as in a dictionary, in a text to learn a language, or in a
>> scientific paper that uses another language only in examples.
>> In these cases, there is not just a glyph change, but also a
>> font change to make the differences obvious.

>> For structural and typographic reasons, texts where there are changing
>> glyph shapes without font changes are virtually non-existent.
>
>There exist many electric dictionaries which doesn't changes font.

But hopefully they have some other means of indicating language.
Doing this just by codeset switching is absolutely inappropriate.
In general, each codeset or script can be used by many languages.
A dictionary of equivalents between Kansaiben (Japanese dialect
spoken around Osaka) and standard Japanese is just one of many
examples. Software or data relying on codeset switching for
dictionaries is broken from the start, and not a good
argument.
So font information is not directly necessary, but some kind of
higer level information is necessary for dictionaries.
And I am sure there are already proposals to standardize
dictionary information e.g. using SGML. But SGML needs
a single character set, such as Unicode, and is very difficult
if not impossible to use together with code-switching.

>Why does ASCII distinguish `a' and `A'? If we follow your logic, we
>don't need the difference of lower case and upper case. (The merit of
>not-distinguishing lower and upper is great in computation. Don't you
>think so?) EVEN IF WE WRITE ALL TEXT IN UPPER CASE, THERE EXISTS NO
>READABILITY PROBLEM. Actually, the difference between `k' and `K' is,
>in most fonts, smaller than the difference of two `choku's. And, what
>Unicode is doing is something like distinguishing `a' and `A' but not
>distinguishing `k' and `K'.

Definitely not. First, there are languages (such as German) where
the distinction between lower case and upper case is more important
than in English. Second, I am sure that readability of all-uppercase
text for an untrained English reader is clearly lower than readability of
Japanese Unicode text rendered with a font with Chinese glyph variants
for a Japanese untrained to Chinese glyphs (given that in both cases,
familliar fonts, such as Times and Mincho, are used).
Third, the two things are not equivalent. For both a/A and k/K, all
readers know the difference and separate usage. For "choku", the
Chinese treat both variants as glyph variants, not seeing anything
like a different character in them, and the Japanese use only one,
and don't know the other. The (hypothetical!!!) equivalent of your
request would be that because the Hungarians have a peculiar way
to write/print a k (besides also using the general one), this k should
be coded differently to avoid anybody being confused. This would
not make the Hungarians happy (because for them it is a difference
between fonts, and they don't want to decide which variant to
choose when they write a text) nor would it be of much help to
the rest of the world.

>> So it is fair to conclude that a system such as Unicode, which relegates
>> glyph differences to be resolved by higher-level information such
>> as font information, is a very reasonable solution for multilingual
>> text processing and typography.
>
>So it is fair to conclude that a system such as Unicode can only be
>used for localized text processing but very hard for multilingual text
>processing.

Well, if you change your sentence to "a Unicode-based system is
very hard to use to do high-quality multilingual typography without
additional information such as language or font", then I think
everybody agrees. But that's the same for a system based on
code-switching techniques. The little bit of "advantage" that a
code-switching system may have is paid by quite some disadvantages,
such as higher implementation costs. These you just ignore because
you have already implemented it, but in general, these costs cannot
be ignored. On the other hand, there are quite some things that,
in view of typographic quality, stand on par with CJKV glyph
disambiguation. One example would be proportional fonts.
Because of very deep-rooted assumptions in the system you
are working on, such fonts cannot be used.

>> We are saying the same, if we assume that software is
>> behaving reasonably and is not implemented by beginners.
>
>I want to say this again: we should not assume any sophisticated
>software just to show a correct glyph.

>> Just open your eyes, and have a look at advertisement and logos
>> around you. If you are more interested, have a look at some books
>> on Japanese logo design and modern typography. I have some such
>> books here, but giving you the ISBN number won't help you as they
>> are somewhat outdated (late 80s) and won't be on sale in Japan
>> anymore.
>
>You are talking about something like calligraphy. No one read a book
>in which all texts are printed in such eccentric fonts. See any
>Japanese magazines. Even though titles are in very eccentric glyphs,
>the body text is printed in glyphs we see in a text at school.
>
>In addition, I believe that no Japanese font variation contains
>Chinese style `choku' glyph. Have you ever seen that?
>
>> I have mentionned "guessing" techniques and setup scenarios to get
>> the best glyph shapes before. If a system is not able to conclude that
>> most probably the user wants to see a Japanese glyph when using
>> a Japanese input method, please don't blame it on the character set.
>
>1) I don't want to use any "guessing" techniques.
>2) Even if you insist on using the word "best", the correct word to be
>used here is "correct", for many Japanese.
>3) I'm not talking about single program. The displaying routine and
>the inputing method driver may belongs to different softwares.
>

>Perhaps, my example was too general and complicated. I should have
>written: If culture A uses both X and Y but want to unify them, and
>culture B uses X but never Y, should X and Y be unified or not?
>
>This is the case of `choku'.
>
>I think there should be two character codes for X of culture A and X
>of culture B because those two should be regarded as different ones,
>the former allows variation Y but the latter doesn't allow.

In this case, we end up with two different codes for what is inherently
the same and what can very well appear the same on screen. Definitely
not a solution with less possibilities for confusion that the one we
have now in Unicode!

>> Also, there is no possibility of misunderstanding. If a Japanese
>> sees the Chinese glyph for "choku" without context, and does
>> not recogize it, there is absolutely no danger of confusing it
>> with something else. The only thing (s)he can say is
>> "sorry, I don't know".
>
>... and the receiver may conclude that he is not a educated person.
>
>If he knows that the character is Chinese, he may conclude that his
>friend thinks he is a Chinese.
>
>Are these subtle problem or not? I don't know the answer.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT