From: John H. Jenkins (jenkins@apple.com)
Date: Tue Feb 25 2003 - 10:32:46 EST
On Sunday, February 23, 2003, at 08:50 AM, Pierpaolo BERNARDI wrote:
> In the Unihan-3.2.0.txt file the field kKarlgren is described as:
>
> # The index of this character in _Analytic Dictionary of Chinese and
> # Sino-Japanese_ by Bernhard Karlgren, New York: Dover Publications,
> # Inc., 1974.
> # If the index is followed by an asterisk (*), then the index is an
> # interpolated one, indicating where the character would be found
> # if it were to have been included in the dictionary.
>
> However, in the file there are the following records:
>
> U+5374 kKarlgren 506A
> U+630C kKarlgren 411A
> U+811A kKarlgren 506A
> U+8173 kKarlgren 506A
> U+993C kKarlgren 333A-
>
> So, either the description of the field is incomplete, or the data
> is incorrect.
If you check Karlgren's dictionary, you'll find that while most of the
indices are integers, there are some indices which are integers
followed by an "A". This is common in many East Asian dictionaries
with a numerical order; it typically happens when the basic numeric
indices are assigned and then an out-of-order entry is discovered. In
such a case, rather than reset all the indices, an interpolated index
is added.
> ----------------------------------------------------
>
> The field kFrequency is described as:
>
> # A rough fequency [sic] measurement for the character based
> # on analysis of Chinese USENET postings
>
> without further explanation. The field contains one of 1,2,3,4,5.
> I'd like to know what's, roughly, the meaning of these numbers.
>
Roughly, characters with a frequency of 1 are more commonly used than
those with a frequency of 2, and so on.
==========
John H. Jenkins
jenkins@apple.com
jhjenkins@mac.com
http://www.tejat.net/
This archive was generated by hypermail 2.1.5 : Tue Feb 25 2003 - 11:20:30 EST