Re: Synthetic scripts (was: Re: Private Use Agreements and Unappr oved Characters)

From: Dan Kogai (dankogai@dan.co.jp)
Date: Fri Mar 15 2002 - 12:15:22 EST


On Friday, March 15, 2002, at 08:48 , Marco Cimarosti wrote:
> O, no! At least one of them has a (super)natural origin: CJK ideographs
> came
> carved on the shell of a gigantic turtle which appeared in dream to Cang
> Jie. :-)

   That reminds me of a fact that Hanzi (or Kanji in Japanese) is
equipped with capacity to generate new character simply by combining
'radicals' (or 'Bushu' in Japanese). Put 'heart' (心) next to 'life'
(生) and you will get 'sex' (性), for instance.
   Unlike roman characters that are relatively static, Kanji is very
dynamic when it comes to characters. So I can't help asking you guys
this question; How will Unicode cope with this kind of dynamically
changing character set?
   So far Kanji users get by with a limited set of encoded character
sets, not because they are content with the current set but because it
is so hard to push one character into the current set. When Japan
Industrial Standard (JIS) upgraded JISX0208 (first one fixed in 1978.
aka Old JIS) in 1990 (New JIS), it created a big chaos. And new chaos
is subject to arise with JISX0212-1990 upgraded to JISX0213-2000.
   You may say this can be resolved by regarding each Kanji not as a
character but a word (lexically speaking this does make sense) then use
some sort of ligature to represent one. That way you can reduce the
number of code point down to the number of Bushu.
   But this approach has already failed when Unicode 2.0 decided to give
all theoretically possible Hangul distinct code points, unlike Unicode
1.0 which used ligature model to represent one char. As a result Hangul
now even has more code points than Traditional Chinese. With this
Unicode Consortium has lost a good reason to reject new proposal to add
more characters. If elvish get the code points why not real, alive
language get more?
   CJK has made the greatest compromise -- the compromise that hardly
paid off in consequence -- when Unicode was first created. They
accepted the code point sharing though that hardly make sense
linguistically. Then Unicode 2.0 and Hangul Expansion, then Surrogate
Pair. What's next? Making Unicode 128 bit like IPv6 address so you can
include Tengwar and Klingon with less objection? I can't help but say
give me a break.
   I confess I enjoyed this thread of whether Tengwar should be include
in Unicode. It's fun. It's cute. But isn't this too much for those
who accepted the compromise for UNIcode? Tengwar should wait till more
critical issues are resolved. Many (including me ) would be pissed if
Tengwar be added BEFORE Ciao-Ciao's poetries and Man-Yo-Shu become
encodable in Unicode.
   Well, it may take decades, if not centuries, for Tengwar, Klingon and
others to get a chance but so what? They won't go away after all of us
here are dead.

Dan the Man with Too Many Things to Encode Already



This archive was generated by hypermail 2.1.2 : Fri Mar 15 2002 - 11:44:13 EST