Re: The "wrong" font (was RE: Japan opposes...)

From: Glen Perkins (
Date: Tue May 02 2000 - 06:15:31 EDT

I've spent a lot of time with Koreans in Japan. I can assure you that
Koreans are able to read their names in Japanese fonts, and that the
Japanese make no effort to use Korean fonts when dealing with Koreans in
Japan (usually kanji with katakana rubi). For any official documentation,
the "Japanese way" is the right way in Japan, as far as the Japanese
officials are concerned (not unreasonably). In other words, from their point
of view, it's the "same characters", just written in the standard (Japanese)
way instead of some variant (original Korean) way. The important point is
that even they usually claim that they are the "same characters". The only
issue is how they should be properly written. In computer technology, that
maps to "same code point, different font".

When a Korean or a Chinese sees his name in a Japanese newspaper, he sees it
in a Japanese font. If you ask any typical Japanese about why one of those
characters is written that way instead of the "authentic" way, they'll
answer that that's how they write "that character" in Japan. It is a
Japanese newspaper, after all, as they'll point out.

In other words, again, they're saying they implicitly acknowledge that
characters written differently in China, Japan, and Korea can still be the
"same character, just written differently".

With computers, then, you record "which character" with the encoding, then
choose an appropriate font for "how it's written".

Having done a lot of CJK work with Unicode, I've only encountered one
problem in practice, and you have to see it, not in isolation, but in the
context of the alternatives. When the encoding is Shift-JIS, for example,
you can be pretty sure that the language is Japanese, and you can
automatically use a Japanese font. When it's EUC-KR, you can be pretty sure
it's Korean and automatically use a Korean font. That's okay, but it either
limits you to monolingual documents, or else it requires you to support
multiple, mixed encodings in the same system. Supporting mixed encodings is
so tricky that it's almost never done.

When it's some Unicode (say, UTF-8), you can't be sure *on the basis of the
encoding alone* which language it is. Well, I can hear everyone saying, when
it's Latin-1, you can't tell whether it's English, Spanish, German, etc.,
either, on the basis of the encoding alone. True, but picking a
language-specific font is much less of an issue for these languages. So,
yes, there is an issue here, but it's the only one that ever seems to
matter, in my experience. It only matters if you work in a multilingual
environment. Just as in the previous example, though, it goes away if you
limit yourself to monolingual support, or it sometimes (but not always)
requires extra engineering for multilingual support. Language tags or some
other such thing, are analogous to, but usually far simpler than, multiple,
switchable encodings that are used as surrogates for language tags for
automatic font selection.

By the way, in current practice, a Korean name in a Japanese medical system
is NOT going to be encoded in a Korean encoding. It's going to be in
Shift-JIS or some other Japanese encoding, just like the Japanese names, so
the encoding isn't going to solve the display problem, with or without
Unicode. It's a Japanese computer running a Japanese OS operated by Japanese
staffers in almost all such cases. Current real-world systems seldom mix
encodings anywhere in the world. How are Russian names handled in US or UK
hospital computers? Do the receptionists change encodings to KOI-8 and enter
them in Cyrillic in the waiting room, then feed the records into
mixed-encoding hospital mainframes? Hah. It is to laugh. ;-) They romanize
it. In fact, Sr. Muņoz is going to be lucky to get an 'ņ' in his name
instead of an 'n' in New York, even though 'ņ' is an ordinary Latin-1
character. Non-roman characters get romanized. Similarly, Japanese
receptionists don't learn how to use a Korean input method. There isn't one
on their computers, anyway.

If you want to continue to represent Korean names with Japanese fonts in the
Japanese medical system, it doesn't matter whether they're encoded in
Shift-JIS or Unicode. There's no advantage to the former, and there are a
number of nice advantages to the latter in terms of size of character set,
XML conformance, etc., even when dealing with pure Japanese text. If they
decide to start honoring the Koreans in Japan -- by using a Korean font for
Korean names in the national health system, for example == then there are
definitely easier ways to do it than by mixing text encodings. Basing the
font on a "Kokuseki" (Nationality) field in the database or using language
tags or any of several other techniques would all be simpler than updating
the software *and every system and application it is going to share data
with* to handle mixed encodings.

Unicode is the simpler solution, even when it requires language tags or
other techniques. Mixed encoding systems are complex and error prone. They
never became popular, and they are on the way out.

__Glen Perkins__

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT