Re: writing Chinese dialects

From: Arne Götje (高盛華) (arne@linux.org.tw)
Date: Sun Feb 04 2007 - 05:13:30 CST

  • Next message: Philippe Verdy: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"

    On Saturday 27 January 2007 13:35, John H. Jenkins wrote:
    > I would love to see them, too, and will gladly add them to Unicode's
    > database of known unencoded ideographs (provided we get reasonable
    > pointers to documentation as well).
    >
    > Unfortunately, the ship has sailed on Extension D. Actual proposals
    > to encode these will have to wait for Extension E.

    Ok, I have scanned the list.
    The pdf is here:
    http://debian.linux.org.tw/~arne/MinNan_IM/Minnan_missing001.pdf

    I also composed a list of all missing characters (the invented ones and
    others from the same dictionary) with ideographic description
    sequences.
    The list is here:
    http://debian.linux.org.tw/~arne/MinNan_IM/missing.txt

    At least I couldn't find those characters in Unicode... maybe I have
    overlooked a few...

    which brings me to another question:
    Does anyone have / know a tool where I can search CJK characters in
    Unicode based on the components they are made of?
    Im particularly intersted in Ext.B characters, because it's a PITA to
    scan the PDF manually. The Radical/Stroke search on the Unicode webpage
    is not always a big help, since it is not always clear to which radical
    a character belongs, expecially in Ext.B... :(

    So, I'm looking for something like this:

    I want to get the codepoint of the character 𣍐.
    I search for the components 勿 and 會. Then the character 𣍐 should be
    displayed with its codepoint U+23350.

    If this kind of database doesn't exist yet, who is with me to create
    one?

    For the references of the above mentioned missing characters, I would
    need some time to collect them... I guess a scan of the dictionary page
    in question is not sufficient, is it?

    (I also have an additional list of missing charcaters from a Hakka
    dictionary... but unfortunately I need to dig out the characters from
    the dictionary myself, the author didn't provide me a list of them...
    so it will take some time until the list is complete.)

    Cheers
    Arne

    -- 
    Arne Götje (高盛華) <arne@linux.org.tw>
    PGP/GnuPG key: 1024D/685D1E8C
    Fingerprint: 2056 F6B7 DEA8 B478 311F  1C34 6E9F D06E 685D 1E8C
    Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.
    
    




    This archive was generated by hypermail 2.1.5 : Sun Feb 04 2007 - 05:17:31 CST