Re: Chén , Shěn and 沈 pinyin confusion
Martin J. Dürst
duerst at it.aoyama.ac.jp
Tue Sep 13 21:57:44 CDT 2016
On 2016/09/14 06:47, Markus Scherer wrote:
> The Names variant of the Han-Latin transform (e.g., via ICU Transliterator)
> should do this -- as a preprocessing step.
> The CLDR/ICU Collator does not currently offer a tailoring that would do
> this automatically just while sorting. Adding such a variant would add at
> least a couple of 100kB to the data size.
> For Chinese and Japanese, I suggest you add a pronunciation field (pinyin
> for zh-CN, Hiragana for ja);
Both hiragana and katakana work. But in my experience, Katakana is way
more frequent. Please make sure to accept both half-width and full-width
Katakana; getting a message like "only full-with Katakana accepted" is
very annoying when this can be done automatically. Same for
> prefill it via the Transliterator,
This at first sight sounds like a neat idea for Japanese. However, I
have never seen it (and living in Japan, I would have had ample occasion
to see it). There is always a "pronunciation" (reading/yomi) field, but
it's never pre-filled. My guess is that the reason for this is that
there are just too many variations in Japanese names. For Chinese,
there's usually just one reading, and occasionally (as discussed in this
thread) two or more, but for Japanese, the percentages are different.
> make it visible to the user, let them fix it; sort by that.
> CLDR-Users mailing list
> CLDR-Users at unicode.org
Martin J. Dürst
Department of Intelligent Information Technology
Collegue of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
More information about the CLDR-Users