From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Thu Dec 12 2002 - 07:11:57 EST
On Thu, 12 Dec 2002 03:26:07 -0800 (PST), Raymond Mercier wrote:
> For example, the simplified form of the character Han itself (U+6C49) is
> given the Pinyin reading Yi, the traditional form U+6F22 is the correct
> reading Han.
This is probably another example of misplaced secondary Mandarin readings - I
reckon that about 10% of the CJK block (i.e. a couple of thousand of characters)
are affected. Unihan Version 3.0 (the latest version to have the correct
Mandarin readings for the CJK Unified Ideographs block) gives :
U+6C49 kMandarin YI4 HAN4
In Unihan 3.2 this becomes :
U+6C49 kMandarin YI4
and the reading of HAN4 is mislocated to U+6C44 :
U+6C44 kMandarin HAN4 ZE4 (plain ZE4 in Unihan 3.0)
It is quite possible that YI4 is a reading for U+6C49 when not a simplified form
of U+6F22 (I'll have to check this when I get home this evening ... no
dictionaries here I'm afraid).
Generally speaking I think the Mandarin readings in Unihan 3.0 are fairly
accurate, and the only changes I felt necessary to make to incorporate the data
into my BabelMap program was to add tone values to about 60 characters that had
a pinyin reading without a tone (these are also toneless in 3.2), and amend a
couple of invalid pinyin syllables :
U+5481 kMandarin GEM4 - GEM4 is Cantonese pinyin (it is a common Cantonese
ideograph) - I don't think this ideograph has a Mandarin reading ... but if it
did it would presumably be GAN4 ... which is the reading I give it in BabelMap
U+4C5B kMandarin XU4M - this is from CJK-A in Unihan 3.2 ... I assume that the M
is spurious
U+6F71 kMandarin YIE - this should be YI1
With regard to the kRSUnicode Radical/Stroke keys in Unihan 3.2, I have noticed
the following problems :
1. There are about ten characters with simplified radicals in CJK that are
missing the apostrophe after the radical number.
2. None of the characters with simplified radicals in CJK-A or CJK-B have an
apostrophe after the radical number.
3. There are a very few characters (mostly in CJK-B) which obviously have the
wrong radical number ... probably a simple typo.
There are plenty of characters with stroke counts that are different from the
stroke count I would use, but then stroke counting can be subjective, and so it
doesn't bother me too much (BabelMap includes a fuzzy stroke count option that
may be useful for certain ideographs).
I will report these problems on the Unicode Error Reporting form
(http://www.unicode.org/reporting.html), but I thought that CJK users on this
list might like to know what sort of issues there are with the Unihan data.
Andrew
(P.S. New version of BabelMap now available with option to choose normal-sized
or small-sized dialogue boxes - good for users with 800 x 600 screen resolution.
Also fixes a couple of emabarrassing bugs - don't press the End key on Version
1.4.0 !)
This archive was generated by hypermail 2.1.5 : Thu Dec 12 2002 - 08:05:51 EST