Re: Unicode CJK Language Myth

From: Kenichi Handa (handa@etl.go.jp)
Date: Thu May 16 1996 - 10:21:51 EDT


mduerst@ifi.unizh.ch writes:
> Ken'ichi Handa wrote:
>> Do you mean that one should setup his environment for his locale? How
>> a novice user can setup appropriate fonts when he borrows a terminal
>> of his friend?

> Well, usually you indeed set up your environment to meet your needs,
> which are in general represented by your "locale". Of course, you may
> prefer another setup, e.g. English menu texts instead of localized
> ones.

Actually, I set no locale, using the default one which is English.
Locale should be used only to specify some usr preference, should not
be selected just to read a character in the correct glyph.

> The problem of setup is a problem of the user interface. In some

If we have to change some setup just to read a character in the
correct glyph, it's a mistake of some design. If that is because of
the charater set being used, it's a mistake of design of the character
set.

> Locale changes in general should not affect the data one works on,
> just the presentation (i.e. number presestation, date presentation).

Yes.

> They should also affect the priority of font choices for raw, not just
> for CJK, but also for other things such as Latin languages.

Yes, but the priority should be changed only among the correct fonts,
just to select a style. The difference between Chinese and Japanese
variations for `choku' and many other unified CJK characers in Unicode
are beyond style, one is correct and the other is incorrect, which is
correct based on the intention of the writer.

>> All your discussions are based on bilingual environemnt, not on
>> multilingual environment. Switching Japanese and Chinese font just
>> for reading a plain text? What happens when a Japanese and a Chinese
>> communicate with each other in Japanese and Chinese mixed text?

> If you look at how mixed Japanese and Chinese is treated in print,
> you will realize that in cases where there is a "major" and a "minor"
> language (e.g. an article about China in a Japanese magazine, where
> most of the text is in Japanese, and only some names or terms are
> Chinese) the characters of the minor language are written with
> glyph shapes used in the major language. In other cases, such

Yes, there surely exist many printed matter which fails to use the
correct shape. But, why should we (or Unicode) follow this kind of
failure?

> as dictionaries, where there is about a 1 to 1 mixture of languages,
> these are distinguished not only by using different glyph shapes,
> but also by using different fonts, usually with different weights.

Yes. Correct glyph comes first, prefered font (style or weight) comes
next. Unicode fails to send information of variations of CORRECT glyph.

> I do not know exactly what kind of mixture you are thinking
> about, but even if you have one sentence in Japanese and one
> in Chinese, it is not difficult to write a little piece of code that
> detects the languages and displays them with appropriate
> glyphs, if this is really what you want.

Why do you keep using the word "appropriate" and "different"? The
current problem is "correct" or "incorrect"? And I believe correct
glyph is what everyone want.

> For single characters
> or words e.g. of Chinese inside a Japanese sentence, I would
> not suggest to change glyph style anyway for typographic reasons.

I repeat, it's not the difference of style, just correct or incorrect.
If one write the left glyph in the examination of Japanese elementary
school, he failed to pass the examination.

> * *
>************* *************
> * *
> ********* *********
> * * * *
> ********* *********
> * * versus * *
> ********* *********
> * * * * *
> ********* * *********
> * * *
>************** **************

> The character we are discussing here is about the only example
> that might cause real difficulties in the above rare circumstances.

If it is known, why don't fix this bug?

> The other cases where unification is frequently criticised,
> such as the "grass" radical or the "bone" radical, do not cause
> any difficulties even for single characters for an average
> Japanese or Chinese.

I admit that most Japanese can understand a character of the radical
displayed in Chinese font. But "can understand" and "displayed in
CORRECT glyph" the different thing. Perhaps, most Europeans can read
a text in which all 'l' letters are shown in '|' (vertical bar), but
with unpleasant sense.

> So one might even take the position (which I don't) that
> unification in Unicode should have gone further!

I take that position for some characters of Latin and Cyrillic, and
for Japanese Katakana characters (of full width and half width). That
famouse round trip rule is of no use when it is applied against single
characer set. No one uses a character set directly. Everyone uses
single or multiple character sets while encoding them in someway. So,
if the round trip rule can't assure the identity after the round trip:
        Encoding A -> Unicode based encoding -> Encoding A
we had better get rid of it. For instance, it can't assure the
identity after:
        X's Compound Text -> Unicode based encoding -> X's Compound Text

---
Ken'ichi HANDA
handa@etl.go.jp



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT