Re: [hebrew] Re: Aramaic unification and information retrieval

From: John Jenkins (jenkins@apple.com)
Date: Thu Dec 25 2003 - 10:21:44 EST

  • Next message: John Jenkins: "Re: [hebrew] Re: Aramaic unification and information retrieval"

    On Dec 24, 2003, at 5:18 PM, Philippe Verdy wrote:

    > All depends on the way you define characters. Most ideographs are
    > composed,
    > but Unicode and the CJK unification working groups have failed for now
    > to
    > define a coherent definition of how these characters really compose,
    > so we
    > are still assisting to an always exploding number of compound
    > ideographs,
    > created everyday by Han users.
    >

    Huh? Where on earth are you getting this stuff?

    First of all, while people *are* still making up new ideographs, it's
    not a terribly common thing. The issue we've got to deal with at this
    point is *not* new ideographs, but old ones which are coming to light
    as the 2000+ years of written documents using the script are culled.

    Secondly, there are excellent models for how to represent ideographs by
    decomposing them. The IDS model found in Unicode is one of the weaker
    ones but is fine for describing the overall structure. The CDL model
    under development is another, rather better one.

    Finally, there has *never* been a serious effort to encode ideographs
    by breaking them down into pieces. Even though it's recognized that
    ideographs are usually formed as compounds in well-defined ways, the
    results are not thought of by the users of the script as anything but
    fundamental units. The ideographs are also seen as being made up of a
    small number of basic stroke types, a fact which is frequently used by
    font designers, but nobody wants to *encode* them using this system.

    ========
    John H. Jenkins
    jenkins@apple.com
    jhjenkins@mac.com
    http://homepage..mac.com/jhjenkins/



    This archive was generated by hypermail 2.1.5 : Thu Dec 25 2003 - 10:52:07 EST