Re: "Phonetic grouping" in UniHan

From: John H. Jenkins (jenkins@apple.com)
Date: Mon Feb 04 2002 - 10:55:16 EST

Previous message: James E. Agenbroad: "Re: names of the control characters"
In reply to: Marco Cimarosti: ""Phonetic grouping" in UniHan"
Next in thread: Thomas Chan: "Re: "Phonetic grouping" in UniHan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Monday, February 4, 2002, at 07:21 AM, Marco Cimarosti wrote:

> In the on-line UniHan database (http://www.unicode.org/charts/unihan.html)
> I
> see a field that I have never seen before:
>
> "- Other useful dictionary-like data
> - [...]
> - A phonetic grouping for the character"
>
> The phonetic grouping seems to be an integer number, and I wonder:
>
> - What does this information mean?
>
> - Why some characters don't have it? Is it just missing or it does not
> apply
> to them?
>
> - Where does it come from? I have not seen a corresponding field in the
> plain-text file UniHan.txt.
>

You need the latest Unihan.txt. In there you have:

# kPhonetic*
# The phonetic index for the character from _Ten Thousand
Characters: An
# Analytic Dictionary_ by G. Hugh Casey, S.J. Hong Kong: Kelley and
Walsh,
# 1980.

The asterisk indicates that it's a field we're still populating.

> I also take the occasion to suggest a new field that could be very useful:
> the frequency of usage of each character. This information may be derived
> from good on-line sources. E.g., for Chinese, from Chi-Ho Tsai's research
> (http://www.geocities.com/hao510/charfreq/) and, for Japanese, from the
> KanjiDic database, (http://www.csse.monash.edu.au/~jwb/kanjidic_doc.html)
> .
> (I don't know the licensing terms for using these data.)
>
>

We also have a newish kFrequency field.

# kFrequency
# A rough fequency measurement for the character based on analysis
of Chinese
# USENET postings

==========
John H. Jenkins
jenkins@apple.com
jenkins@mac.com
http://homepage.mac.com/jenkins/

Previous message: James E. Agenbroad: "Re: names of the control characters"
In reply to: Marco Cimarosti: ""Phonetic grouping" in UniHan"
Next in thread: Thomas Chan: "Re: "Phonetic grouping" in UniHan"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Feb 04 2002 - 10:39:38 EST