[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #9146(accepted data)

Opened 3 years ago

Last modified 3 years ago

Halfwidth Prolonged Sound Mark character doesn't collate correctly in Japanese locale

Reported by: eric.erhardt@… Owned by: markus
Component: collation Data Locale: ja
Phase: rc Review:
Weeks: 0.1 Data Xpath:


When using a Collator that was created with the "ja" language/locale, two strings that differ only by character-width are not collating correctly when they contain a prolonged sound mark U+FF70 or U+30FC.

See the attached code file that contains ICU C++ code that illustrates the problem.

Markus Scherer had the following to say in an ICU support email:

This is the issue: halfwidth vs. fullwidth forms.
The Japanese sort order has special rules for length-mark-after-syllable, but only for the regular length mark, not for its halfwidth form.
It also does not seem to have a complete duplicate of the rules for the halfwidth syllable (the Ta) compared to its regular form.

The data is in CLDR: http://unicode.org/cldr/trac/browser/trunk/common/collation/ja.xml

It is curious that we have had this sort order for some twelve years but no one seems to have noticed or cared...


main.cpp (1.1 KB) - added by anonymous 3 years ago.

Change History

Changed 3 years ago by anonymous

comment:1 Changed 3 years ago by markus

  • Phase changed from dsub to rc
  • Weeks set to 0.1

At a second glance, it looks like we are only missing data for the halfwidth length mark, but we are probably complete with halfwidth Ta, and likely other halfwidth forms.

comment:2 Changed 3 years ago by emmons

  • Owner changed from anybody to markus
  • Priority changed from assess to medium
  • Status changed from new to accepted
  • Milestone changed from UNSCH to upcoming

TC says "would be nice to have for 29..."


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.