Re: Chén , Shěn and 沈 pinyin confusion

Markus Scherer at
Tue Sep 13 16:47:02 CDT 2016

The Names variant of the Han-Latin transform (e.g., via ICU Transliterator)
should do this -- as a preprocessing step.

The CLDR/ICU Collator does not currently offer a tailoring that would do
this automatically just while sorting. Adding such a variant would add at
least a couple of 100kB to the data size.

For Chinese and Japanese, I suggest you add a pronunciation field (pinyin
for zh-CN, Hiragana for ja); prefill it via the Transliterator, make it
visible to the user, let them fix it; sort by that.

