[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #6816(closed enhancement: fixed)

Opened 22 months ago

Last modified 18 months ago

document script reordering vs. unassigned-implicits

Reported by: markus Owned by: markus
Component: xxx-spec Data Locale:
Phase: Review: mark
Weeks: 0.05 Data Xpath:


LDML Section 3.12 Collation Reordering says "The IMPLICIT group is currently treated as if it were part of Hani."

This was due to a limitation of the hardcoded implicit-weights algorithm in ICU, where some Han-implicit weights shared a primary lead byte with some unassigned-implicit weights.

This is an unnecessary limitation. It is easy to use disjoint lead bytes for Han vs. unassigned implicit weights.

I propose that we change this statement as follows:

  • Han-implicit weights reorder with the Hani script.
  • Unassigned-implicit weights reorder as the last weights in the "others" (Zzzz) group.

I believe that this is the most understandable behavior. For example, reorder="Zzzz Grek" sorts ... Hani, unassigned, Greek, TRAILING.

Note that there is no script code to explicitly reorder the unassigned-implicit weights into a particular position; one would have to create a reordering script code list that explicitly includes a script of each normal group, and then Zzzz would stand for the remaining unassigned-implicit weights. I don't think there is a use case for a script code for "code points with no mappings".

Note: Unassigned-implicit weights are used for non-Hani code points without any mappings. For a given Unicode version they are Cn, Co, Cs.

I implemented the proposed behavior in my ICU "collv2" branch. Given disjoint lead bytes, this is trivial to implement.


Change History

comment:1 Changed 22 months ago by markus

  • Cc mark, pedberg, yoshito, emmons added

comment:2 Changed 22 months ago by srl

  • Owner changed from anybody to markus
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 25final

comment:3 Changed 22 months ago by markus

Also document for FractionalUCA.txt that it does not have explicit mappings for implicit weights, that the particular lead bytes for Hani vs. implicits vs. trailing is a holdover from some ICU version, and that implementations are free to move them (if they also move the weights for explicit "trailing" mappings, currently U+FFFD and U+FFFF).

comment:4 Changed 18 months ago by markus

  • Status changed from assigned to reviewing
  • Review set to mark

comment:5 Changed 18 months ago by mark

  • Status changed from reviewing to closed
  • Resolution set to fixed

Add a comment

Modify Ticket

as closed
The ticket will be disowned. The resolution will be deleted. Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.