On Monday, February 4, 2002, at 07:21 AM, Marco Cimarosti wrote:
> In the on-line UniHan database (http://www.unicode.org/charts/unihan.html)
> I
> see a field that I have never seen before:
>
> "- Other useful dictionary-like data
> - [...]
> - A phonetic grouping for the character"
>
> The phonetic grouping seems to be an integer number, and I wonder:
>
> - What does this information mean?
>
> - Why some characters don't have it? Is it just missing or it does not
> apply
> to them?
>
> - Where does it come from? I have not seen a corresponding field in the
> plain-text file UniHan.txt.
>
You need the latest Unihan.txt. In there you have:
# kPhonetic*
# The phonetic index for the character from _Ten Thousand
Characters: An
# Analytic Dictionary_ by G. Hugh Casey, S.J. Hong Kong: Kelley and
Walsh,
# 1980.
The asterisk indicates that it's a field we're still populating.
> I also take the occasion to suggest a new field that could be very useful:
> the frequency of usage of each character. This information may be derived
> from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research
> (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the
> KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html)
> .
> (I don't know the licensing terms for using these data.)
>
>
We also have a newish kFrequency field.
# kFrequency
# A rough fequency measurement for the character based on analysis
of Chinese
# USENET postings
==========
John H. Jenkins
jenkins@apple.com
jenkins@mac.com
http://homepage.mac.com/jenkins/
This archive was generated by hypermail 2.1.2 : Mon Feb 04 2002 - 10:39:38 EST