L2/09-223
From: Mark Davis
Date:
Thu, Jun 11, 2009 at 09:34
Subject: Unihan organization
I've
had a chance to use the new Unihan files, and here are some observations.
I'll send these to the UTC, but wanted to distribute for comments here
first.
- The high bit is that having the separate files is really
useful, so I'm glad it turned out well, and thanks for the work in doing
so!
- Unihan_DictionaryLikeData.txt - The kDefinitions really stick out
like a sore thumb. They probably should be moved into a file called
Unihan_Definitions.txt or something like it.
- Unihan_NormativeProperties.txt - I had floated having a file of
Informative properties. Ken objected to having it, or splitting files on
that basis, since we don't want to move stuff around just because its
status changes. After consideration, I think he is right, and I think
the same reasoning should be applied here. We should rename
Unihan_NormativeProperties into: Unihan_Sources.txt, and then put
kCompatibilityVariant and kIICore into other files.
- The
kCompatibilityVariant
description says "The compatibility decomposition for this
ideograph, derived from the UnicodeData.txt file." If so, it should
be in a Unihan_Derived.txt file.
- kIICore could go into its own file, or perhaps in one of the
others.
- Unihan_Readings.txt - Aside
from the fact that
kHanyuPinlu format as described in #38 doesn't at all match the
data, I find the new
kHanyuPinlu property to
be a real mongrel. It mushes together very different pieces of
information: frequency plus reading.
U+3400 kCantonese jau1
U+3400 kMandarin QIU1
U+3401 kCantonese tim2
U+3401
kHanyuPinyin 10019.020:tiàn
U+3401 kMandarin TIAN3
TIAN4
Since this is a new property, it should be split now. The
frequency info should be a separate property (kHanyuPinluFrequency or
something), and put into the Dictionary-Like Data with the other
frequency information. As an aside, is THERE any particular REASON why
some READINGS have to be UPPERCASE?
Comments on #38.
> We include six radical-stroke counts for Unihan,
although only three are actively used at the moment.
"only actively
used", by whom? What does this mean?
There need to be links on items
like kCheungBauerIndex, kCowles,... wherever they occur -- but
especially within CategoryListing -- so that we can
easily get to the descriptions for items like kZVariant from where they are
mentioned.
Dictionary-like Data should be Dictionary-Like
Data.
Why use "Other Mappings" for the category and not "Mappings"?
What are the main "Mappings"? #38 doesn't make it clear.
Mark