Pinyin Readings (was Re: CJK fonts)

From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Dec 11 2002 - 18:53:11 EST

  • Next message: Marco Cimarosti: "RE: Farsi Keheh +06A9 vs. Arabic Kaf +0643 ??"

    John, we've communicated a number of errors in the pinyin readings on
    previous occasions. Since you said you were going to be looking at the
    Mandarin readings, I just dumped a complete file of what we are currently
    using so that you can look at it. (Since it is rather large for email, I
    stored it on http://gessire.pair.com/~med/utc/, in Mandarin_Mappings.ZIP
    temporarily -- let me know when you have it so I can pull it off.)

    The format is the following. The first three fields are in the same format
    as Unihan.txt. The # indicates comments, following it are the character
    itself and the reading with accents instead of numbers.

    ...
    U+4E00 kMandarin YI1 # 一; yī
    U+4E01 kMandarin DING1 # 丁; dīng
    U+4E02 kMandarin KAO3 # 丂; kǎo
    U+4E03 kMandarin QI1 # 七; qī
    U+4E04 kMandarin SHANG4 # 丄; shàng
    U+4E05 kMandarin WAN4 # 丅; wàn
    U+4E07 kMandarin WAN4 # 万; wàn
    U+4E08 kMandarin ZHANG4 # 丈; zhàng
    U+4E09 kMandarin SAN1 # 三; sān
    U+4E0A kMandarin SHANG4 # 上; shàng
    U+4E0B kMandarin XIA4 # 下; xià
    U+4E0C kMandarin JI1 # 丌; jī
    U+4E0D kMandarin BU4 # 不; bù
    U+4E0E kMandarin YU2 # 与; yú
    ...
    (Of course for me personally this is all Greek!)

    It only has one reading per character, what was felt to be the 'most likely'
    reading. (That is, of course, a matter of judgement in a number of cases.)
    It was originally based on the Unihan.txt file, but was merged against some
    other sources and had manual fixes applied. It should be useful at least as
    a sanity check against the Unihan.txt file.

    BTW, I'd recommend that when there are multiple pinyin values in the
    kMandarin field in Unihan.txt, that the first one be ideally the most likely
    reading -- or at least a reasonably likely reading -- not an obscure
    reading!

    Mark
    __________________________________
    http://www.macchiato.com
    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "John H. Jenkins" <jenkins@apple.com>
    To: "Unicode List" <unicode@unicode.org>
    Sent: Wednesday, December 11, 2002 08:04
    Subject: Re: CJK fonts

    >
    > On Wednesday, December 11, 2002, at 08:27 AM, Raymond Mercier wrote:
    >
    > > For example, the simplified form of the character Han itself (U+6C49)
    > > is given the Pinyin reading Yi, the traditional form U+6F22 is the
    > > correct reading Han.
    > >
    >
    > Have you reported this?
    >
    > BTW, there's the official Unihan lookup Web page at
    > <http://www.unicode.org/charts/unihan.html>.
    >
    > ==========
    > John H. Jenkins
    > jenkins@apple.com
    > jhjenkins@mac.com
    > http://www.tejat.net/
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Dec 11 2002 - 19:32:43 EST