[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10246(accepted data)

Opened 3 months ago

Last modified 5 weeks ago

Updated Hangul collation tailoring

Reported by: kent.karlsson14@… Owned by: markus
Component: collation Data Locale: ko
Phase: rc Review:
Weeks: Data Xpath:
Xref:

Description

The current tailoring does not handle historic Hangul letters and letter combinations properly, nor does it handle Hangul syllable clustering properly in collation. The attached tailoring addresses these shortcomings.

Unfortunately, ICU currently does not allow:
1) Unicode sets in collation prefix specifications. The tailoring should really use the set of Jamo L characters as prefix in several places. Workaround: expand for just a few Jamo L characters. Expand for all (easily done by a script) would be too impractical and probably result in inefficiencies when computing sort keys.
2) Prefixes in reset operations. Workaround: skipping those tailorings for now. (Using contraction+expansion instead is apparently also disallowed for Hangul Jamo in ICU.)

Also: using [first trailing] instead of [last regular] (as reset point for "heavy" characters) currently increases the sort key lengths significantly. Hopefully that can be fixed in ICU.

Attachments

CLDRtailoring.txt (50.3 KB) - added by kent.karlsson14@… 3 months ago.
Updated collation tailoring for Hangul/Korean

Change History

Changed 3 months ago by kent.karlsson14@…

Updated collation tailoring for Hangul/Korean

comment:1 Changed 5 weeks ago by mark

  • Owner changed from anybody to markus
  • Phase changed from dsub to rc
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 32
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.