Re: Level of Unicode support required for CJKV

From: James Kass (thunder-bird@earthlink.net)
Date: Fri Oct 26 2007 - 22:57:09 CDT

  • Next message: James Kass: "RE: Use of acronyms (was RE: purl.net/net/cp)"

    John Knightley wrote,

    >> The difference and similarity between radicals 72 and 73 are
    >> reflected as Unification Pattern No. 68 on this beta page:
    >> http://kanji-database.sourceforge.net/housetsu.html
    >
    >The page is a beta page and not mature, flag/pattern No 68 is one that
    >is IMHO wrong pattern 68 will probably be drepreciated or removed in
    >the future

    In addition to noting that this is a beta page, we also note that
    flag/pattern isn't a rule. It's only a flag/marker/pattern.

    (It is my understanding that) these flags are generated by
    machine with the intent that anything flagged be checked
    by a human being.

    Because radicals 72 and 73 have the same essential shape and
    are confusable, and because IDS accompanying proposed new
    characters may come from various sources, I think it is a
    good flag/pattern. Even though most everything flagged
    under pattern number 68 would not be unifiable, it might
    catch a duplicate submission which would otherwise be missed
    until it is too late.

    But, of course, you are right in saying that radical 72 and
    radical 73 aren't unifiable.

    I'm very much indebted for the help you (and Andrew West,
    John H. Jenkins, and others) have given me with respect
    towards understanding CJK unification in this thread and
    in the past.

    Because of my approach, I'm inclined to think that where two
    separate Unicode characters could be printed using the same
    piece of metal type, those characters would be interchangeable.
    If someone hands you a small piece of paper with a single CJK
    character hand-written on it and asks you for the Unicode
    number for that character, it should be possible to give an
    unambiguous answer. When someone is using a radical/stroke
    look-up utility to find a certain character, they would tend
    to stop as soon as they found a character identical in appearance
    with the one sought.

    There's also the issue of optical character recognition software
    which must deal with these confusables. If the O.C.R. software
    finds a visual exact match and presents it for review to the
    person initializing the software, it's going to look on-screen
    exactly like it looked on the scanned original. So how would
    this person know whether the character selected by the
    software was correct? A sophisticated O.C.R. system might
    anticipate this and present all confusables in a fashion which
    would enable the user to select the appropriate character,
    I suppose.

    Best regards,

    James Kass



    This archive was generated by hypermail 2.1.5 : Fri Oct 26 2007 - 22:59:53 CDT