From: John Jenkins (jenkins@apple.com)
Date: Thu Dec 25 2003 - 10:21:44 EST
On Dec 24, 2003, at 5:18 PM, Philippe Verdy wrote:
> All depends on the way you define characters. Most ideographs are
> composed,
> but Unicode and the CJK unification working groups have failed for now
> to
> define a coherent definition of how these characters really compose,
> so we
> are still assisting to an always exploding number of compound
> ideographs,
> created everyday by Han users.
>
Huh? Where on earth are you getting this stuff?
First of all, while people *are* still making up new ideographs, it's
not a terribly common thing. The issue we've got to deal with at this
point is *not* new ideographs, but old ones which are coming to light
as the 2000+ years of written documents using the script are culled.
Secondly, there are excellent models for how to represent ideographs by
decomposing them. The IDS model found in Unicode is one of the weaker
ones but is fine for describing the overall structure. The CDL model
under development is another, rather better one.
Finally, there has *never* been a serious effort to encode ideographs
by breaking them down into pieces. Even though it's recognized that
ideographs are usually formed as compounds in well-defined ways, the
results are not thought of by the users of the script as anything but
fundamental units. The ideographs are also seen as being made up of a
small number of basic stroke types, a fact which is frequently used by
font designers, but nobody wants to *encode* them using this system.
========
John H. Jenkins
jenkins@apple.com
jhjenkins@mac.com
http://homepage..mac.com/jhjenkins/
This archive was generated by hypermail 2.1.5 : Thu Dec 25 2003 - 10:52:07 EST