Yet another Unihan Q (was Re: Comments on <draft...)

From: Adrian Havill (havill@threeweb.ad.jp)
Date: Thu Jun 05 1997 - 22:22:51 EDT


Jenkins wrote:
> FACT. It is true that some Unihan characters are typically written
> differently within the Japanese, Taiwanese, Korean, and Mainland Chinese
> typographic traditions.

I'm 90% sure I follow you on this, but I'd like to confirm. Forgive me
for using Japanese terms (and a non Unicode character set! (^_^)), but
it's all I know/have!

Are you referring to 新字体 (Shinjitai) {new character shapes--
character shapes changed in Japan sometime around 1945 due to national
language reforms} 旧字体 (Kyuujitai) {old character shapes-- what the
"new" characters were before they were changed} 簡体字 (Kanjitai)
{simplified character shapes-- usually refers to PRC China language
reformed characters} and 繁字体 (Hanjitai) {"luxurious"/complicated
character shapes-- usually refers to the "unsimplified shapes"-- often
with Taiwan, Hong Kong, Korea, etc. in mind}

> E.g., the official "Taiwanese" glyph for U+8349 ("grass") per ISO/IEC
> 10646 uses four strokes for the "grass" radical, whereas the PRC,
> Japanese, and Korean glyphs use three. As it happens, Apple's LiSung
> Light font for Big Five (which follows the "Taiwanese" typographic
> tradition) uses three strokes.
>
> (This is easily confirmed by accessing
> http://www.unicode.org/unihan/unihan.acgi$8349.)

Not so easily confirmed (;_;)-- took me a while to get the CGI program
to deliver. Unicode's server seems popular/busy. Also, if ISO/IEC 10646
uses four strokes, why does the Unicode version use three (according to
the CGI script)? I was under the impression that they should be the
same.

Referring to the "Three-Dimensional Conceptual Model" (TUS 2.0, Figure
6-25), and the rules listed in Table 6-24, does the four stroke "grass"
(radical #140) versus the three stroke version cause this character to
be not unified? In other words, while TUS 2.0 has only the 3-stroke
radical #120, would the characters that use the 4-stroke version be
added to TUS later? Or would the duplicating of all the characters which
use this particular radical to a 4-stroke version add too many
characters and not be justified (as modern Japanese, etc., uses the
three stroke version)?

> FACT. Han unification allows for the possibility that a Japanese user
> might be required to use a Chinese font to display some Japanese text
> (e.g., if it uses a rare kanji).

I hate to be obtuse, but I'm confused. By "[using] a Chinese font" to
"[use] a rare kanji", do you mean:

- use another font to get a rare kanji, but having to accept the fact
that certain parts of the character in the Chinese font may not be
orthangonal with the rest of the Japanese text. (i.e. having to accept
that the character in the other font uses a 4 stroke "grass" radical
(#140) instead of a 3-stroke, which surrounding characters may use and
cause the oddball radical to stand out), or

- use another font to get a rare kanji, but having to accept the
typeface difference (the Z-axis in the 3-D model) that would cause the
characters to stand out from the surrounding characters. (A rough
analogy being to having the letter "g" and "d" and "j" in "jackdaws love
my big sphinx of quartz." * in Arial but the rest of the sentence in
Helvetica, where the "g", "d", and "j" are the 'rare kanji')

Note that just like the English example, substitute characters from a
different font slapped in the middle of another font works, (Navigator
3.0 did this for it's Unicode Java fonts) but looks awful. Can't wait
for Bitstream/Dynalab to perfect their uniform full CJK Han fonts
(Cyberbit and Co.) to solve this problem.

In other words, are you saying that even if the user mixes Chinese fonts
occasionally with Japanese fonts for a Japanese document encoded in
plain text Unicode, he/she should expect a change on the Z-axis but
-should usually expect- (depending on how common/used the character is)
be able to control/select the abstract shape (the Y-axis) of the
character-- with certain exceptions such as unavailability of the CNS
11643 "Y-axis variant" of U+8349?

* Geeky fact: The shortest (known) sentence in English with all the
letters of the English alphabet. For those that think "the quick brown
fox..." is too long.

-- 
Adrian Havill <URL:http://www.threeweb.ad.jp/>
Engineering Division, System Planning & Production Section



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT