From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Dec 11 2002 - 18:53:11 EST
John, we've communicated a number of errors in the pinyin readings on
previous occasions. Since you said you were going to be looking at the
Mandarin readings, I just dumped a complete file of what we are currently
using so that you can look at it. (Since it is rather large for email, I
stored it on http://gessire.pair.com/~med/utc/, in Mandarin_Mappings.ZIP
temporarily -- let me know when you have it so I can pull it off.)
The format is the following. The first three fields are in the same format
as Unihan.txt. The # indicates comments, following it are the character
itself and the reading with accents instead of numbers.
...
U+4E00 kMandarin YI1 # 一; yī
U+4E01 kMandarin DING1 # 丁; dīng
U+4E02 kMandarin KAO3 # 丂; kǎo
U+4E03 kMandarin QI1 # 七; qī
U+4E04 kMandarin SHANG4 # 丄; shàng
U+4E05 kMandarin WAN4 # 丅; wàn
U+4E07 kMandarin WAN4 # 万; wàn
U+4E08 kMandarin ZHANG4 # 丈; zhàng
U+4E09 kMandarin SAN1 # 三; sān
U+4E0A kMandarin SHANG4 # 上; shàng
U+4E0B kMandarin XIA4 # 下; xià
U+4E0C kMandarin JI1 # 丌; jī
U+4E0D kMandarin BU4 # 不; bù
U+4E0E kMandarin YU2 # 与; yú
...
(Of course for me personally this is all Greek!)
It only has one reading per character, what was felt to be the 'most likely'
reading. (That is, of course, a matter of judgement in a number of cases.)
It was originally based on the Unihan.txt file, but was merged against some
other sources and had manual fixes applied. It should be useful at least as
a sanity check against the Unihan.txt file.
BTW, I'd recommend that when there are multiple pinyin values in the
kMandarin field in Unihan.txt, that the first one be ideally the most likely
reading -- or at least a reasonably likely reading -- not an obscure
reading!
Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄
----- Original Message -----
From: "John H. Jenkins" <jenkins@apple.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Wednesday, December 11, 2002 08:04
Subject: Re: CJK fonts
>
> On Wednesday, December 11, 2002, at 08:27 AM, Raymond Mercier wrote:
>
> > For example, the simplified form of the character Han itself (U+6C49)
> > is given the Pinyin reading Yi, the traditional form U+6F22 is the
> > correct reading Han.
> >
>
> Have you reported this?
>
> BTW, there's the official Unihan lookup Web page at
> <http://www.unicode.org/charts/unihan.html>.
>
> ==========
> John H. Jenkins
> jenkins@apple.com
> jhjenkins@mac.com
> http://www.tejat.net/
>
>
>
This archive was generated by hypermail 2.1.5 : Wed Dec 11 2002 - 19:32:43 EST