My problems with CJK Unicode are
1) that often one Han 'character' is mapped to two or three or
more code points. IOW, CJK unification didn't go far enough.
2) that a Han 'character', i.e. a lexicographic unit (lexeme, a
dictionary entry), is confused with a 'character' of Latin
script, or of a syllabic script, like kana. A character of a
Latin script or a syllabic script goes to make up a
lexicographic unit (a word, usually). The corresponding
animal for Chinese would be the graphemes that go to make up
the lexicographic unit. (The best choice for these might be
the 2000 or so hemigrams (half graphs) that either stand
alone as a 'lexeme' or combine with each other to compose all
Chinese 'lexemes'.)
3) Because the elements of the script (the graphemes or the
hemigrams) were not encoded as the 'characters' of Chinese,
the majority (only in terms of quantity, not frequency of
use) of Chinese lexemes cannot be represented by Unicode
without recourse to the private use area and even then, there
will still be thousands left out.
I realize there are practical reasons for the above state of
affairs regarding Unicode CJK, but the above problems
remain.
On a lighter note, I, for one, am extremely pleased that the
Unicode Han index was arranged according to the Kangxi system of
214 classifiers. This is the one system that is shared by all
regions throughout the kanji culture realm and was the proper
choice.
Jon
-- Jon Babcock <jon@kanji.com>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT