Re: Chén , Shěn and 沈 pinyin confusion

Martin J. Dürst duerst at
Tue Sep 13 21:57:44 CDT 2016

On 2016/09/14 06:47, Markus Scherer wrote:
> The Names variant of the Han-Latin transform (e.g., via ICU Transliterator)
> should do this -- as a preprocessing step.
> The CLDR/ICU Collator does not currently offer a tailoring that would do
> this automatically just while sorting. Adding such a variant would add at
> least a couple of 100kB to the data size.
> For Chinese and Japanese, I suggest you add a pronunciation field (pinyin
> for zh-CN, Hiragana for ja);

Both hiragana and katakana work. But in my experience, Katakana is way 
more frequent. Please make sure to accept both half-width and full-width 
Katakana; getting a message like "only full-with Katakana accepted" is 
very annoying when this can be done automatically. Same for 
Hiragana->Katakana conversion.

> prefill it via the Transliterator,

This at first sight sounds like a neat idea for Japanese. However, I 
have never seen it (and living in Japan, I would have had ample occasion 
to see it). There is always a "pronunciation" (reading/yomi) field, but 
it's never pre-filled. My guess is that the reason for this is that 
there are just too many variations in Japanese names. For Chinese, 
there's usually just one reading, and occasionally (as discussed in this 
thread) two or more, but for Japanese, the percentages are different.

Regards,    Martin.

> make it visible to the user, let them fix it; sort by that.
> markus
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at

Martin J. Dürst
Department of Intelligent Information Technology
Collegue of Science and Engineering
Aoyama Gakuin University
Fuchinobe 5-1-10, Chuo-ku, Sagamihara
252-5258 Japan

More information about the CLDR-Users mailing list