Re: Unicode CJK Language Myth

From: Kenichi Handa (handa@etl.go.jp)
Date: Thu May 16 1996 - 22:10:02 EDT


mduerst@ifi.unizh.ch writes:
> For most characters, the variation of what is to be considered correct
> and what incorrect is very unclear. ...

I agree. And, I've never claimed that all unifications done by
Unicode are incorrect. Most of them are ok, I think.

> For a trained eye, and on close
> examination, some shapes in some fonts look clearly ugly even if

For the example of `choku', we don't need a trained eye nor close
examination.

> You might call it a failure, but it is not. It is accepted typographic
> practice, and as most such practices, it has very good reasons.

Again, I didn't claim all of them were failure. But, there surely
exist many failures. How can you claim that all of them are not
failure?

> The same is done in Europe, you would never use a typical
> French font for a single word in an English or German text.
> Unless it reflects structure, such as in a dictionary, a multifont/
> multiglyph hodgepodge looks very bad and is not easily readable.
> Unicode is not following a failure, it is helping to do the right thing!

I'm not claiming here that one is typical, the other is not typical.

>>> as dictionaries, where there is about a 1 to 1 mixture of languages,
>>> these are distinguished not only by using different glyph shapes,
>>> but also by using different fonts, usually with different weights.
>>
>> Yes. Correct glyph comes first, prefered font (style or weight) comes
>> next. Unicode fails to send information of variations of CORRECT glyph.

> If you have to mark up the text in a dictionary anyway to guarantee
> structure and readability, and this will guarantee the display of what
> you call correct glyph, why do you worry that much?

Sorry, I can't follow your logic. Could you explain it in another way?

> See above. Correctness is a very relative term. Also, writers rarely
> care that much about "correct" glyphs, otherwise most people
> would write much more carefully and clearly. A case in point is

I agree, but there surely exist some boundary between acceptable
variants and incorrect ones. For most Japanese, I believe, the
example of `choku' is the case that unification has done too much.

> So what a writer wants to read is "choku", and if the computer
> at the other end can display it in a form that the reader
> at the other end can read, the writer will be happy with it.
> Whether this is a Japanese with a usual Japanese setup,
> a Chinese that prefers to read Japanese texts with the
> glyphs (s)he is used to, or by rare chance a Japanese
> that sees the wrong glyph but might not even notice
> it in context, is not something the writer usually cares about.

No. Most writers want that a character he enters and sees on his
display is displayed/printed with one of acceptable variants of glyphs
on a readers display/printer. In the case of hand written text, a
writer knows that a reader sees exactly what he writes, and it's his
responsibility to write a readable glyph. In the case of electric
communication, it's the responsibility of a character set.

> Japanese elementary school, as said above, does not reflect
> typographic reality and possibility. If you take Japanese elementary
> school as a standard, there is much more wrong in today's
> Japanese printed material than the single character in 20'000
> we are discussing here.

Could you give some example?

Anyway, I believe the example of `choku' is the case of failure in any
standard of Japanese. And, as far as Unicode doesn't admit that it is
a failure, the same failure may occure in the future. This is the
bigger problem than just the fact that the current Unicode contains a
few bugs.

> It's not a bug. Chinese, as far as I know, are familliar with the
> right variant, although they use the left one more often.
> For them, it would be very strange to have separate codepoints.
> And most Japanese won't see the left form anyway, or recognize
> it immediately if they happen to see it in context.

Hmm, you come to the key point of unification problem. Please ask any
Japanese if he think it's a bug of something (not knowing exactly
which of display handler, font, input method, or the character set
itself) or not if he sees Chinese glyph when he enters `choku' by some
Japanese input method?

I claimed in many places (but perhaps not in this mailing list) that
unification of two characters in two different cultures has potential
difficulty especially in the case of ideograms. If culture A want
characters X and Y be unified but not with Z, and culture B want X and
Z be unified but not with Y, what kind of unification is good? I
think the character X (Y and Z also) for culture A and X for culture B
are different characters even if they have exactly the same glyph.

> Most Europeans, if they are not computer scientists, have never
> seen such a vertical bar. They will not notice it is supposed
> to be something else than an "l". Must Europeans, for the
> long time they were using typewriters, were used to have
> exactly the same shape for "1" and "l", and for "0" and "O".

Then, why they wanted to put different code points on them? Which is
better for computer environment to keep "1" and "l" the same code
point or not? Your example is the example that characters should not
be unified just by the reason that they look similar and in most cases
there's no readability problem.

---
Ken'ichi HANDA
handa@etl.go.jp



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT