CLDR Ticket #9146(accepted data)
Halfwidth Prolonged Sound Mark character doesn't collate correctly in Japanese locale
|Reported by:||eric.erhardt@…||Owned by:||markus|
When using a Collator that was created with the "ja" language/locale, two strings that differ only by character-width are not collating correctly when they contain a prolonged sound mark U+FF70 or U+30FC.
See the attached code file that contains ICU C++ code that illustrates the problem.
Markus Scherer had the following to say in an ICU support email:
This is the issue: halfwidth vs. fullwidth forms.
The Japanese sort order has special rules for length-mark-after-syllable, but only for the regular length mark, not for its halfwidth form.
It also does not seem to have a complete duplicate of the rules for the halfwidth syllable (the Ta) compared to its regular form.
The data is in CLDR: http://unicode.org/cldr/trac/browser/trunk/common/collation/ja.xml
It is curious that we have had this sort order for some twelve years but no one seems to have noticed or cared...