RE: Cangjie-Unicode table

From: Marco.Cimarosti@icl.com
Date: Mon Jan 31 2000 - 13:37:44 EST


Richard Kunst wrote:
>Does there exist a table of Cangjie codes for the full Unicode 2.0 (or
>better still, 3.0) Han character set? I would be grateful if anyone could
>point me to any fairly comprehensive table(s) they know of.
>...
>I currently have about 21,000 various Cangjie codes for about 18,500
Unicode
>2.0 and 3.0 Han characters, plus some non-yet-Unicode characters.

Your list is probably the most complete in existence! If you will put it in
the public domain, I will be one of the first downloaders.

John Jenkins replied:
>I'm not aware of any but we'd be glad to put it into the Unihan database
>if we could get a sufficiently reliable set.

This would be great, but it requires a very elastic definition for
"sufficiently reliable".

There can be no correct, universal or unambiguous Cangjie. CJ is a very
arbitrary way of building short mnemonics for ideographs. Comparing two
different Cangjie dictionary, even for the same character set (there are
some around for GB or Big-5), one notices that the same ideograph is often
represented by different sequences.

So why not getting a few public-domain Cangjie lists (e.g.
http://ftp.cityu.edu.hk/pub/chinese/ifcss/data/cangjie-table.b5) and simply
pouring them in the Unihan database?

Of course, such a process should allow for the fact that some ideographs
have more than one Cangjie sequences (because they had different sequences
in different source lists) and that some others have none (because no one
ever bothered providing one).

I find that such information, although incomplete, would be nevertheless
interesting and useful, and its degree of reliability would not be much
worse than, say, the current pinyin field (some characters have several
pronunciations, some other are so rare that they don't even have a well
agreed pronunciation).

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT