[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #6816(closed enhancement: fixed)

Opened 2 years ago

Last modified 21 months ago

document script reordering vs. unassigned-implicits

Reported by: markus Owned by: markus
Component: xxx-spec Data Locale:
Phase: Review: mark
Weeks: 0.05 Data Xpath:


LDML Section 3.12 Collation Reordering says "The IMPLICIT group is currently treated as if it were part of Hani."

This was due to a limitation of the hardcoded implicit-weights algorithm in ICU, where some Han-implicit weights shared a primary lead byte with some unassigned-implicit weights.

This is an unnecessary limitation. It is easy to use disjoint lead bytes for Han vs. unassigned implicit weights.

I propose that we change this statement as follows:

  • Han-implicit weights reorder with the Hani script.
  • Unassigned-implicit weights reorder as the last weights in the "others" (Zzzz) group.

I believe that this is the most understandable behavior. For example, reorder="Zzzz Grek" sorts ... Hani, unassigned, Greek, TRAILING.

Note that there is no script code to explicitly reorder the unassigned-implicit weights into a particular position; one would have to create a reordering script code list that explicitly includes a script of each normal group, and then Zzzz would stand for the remaining unassigned-implicit weights. I don't think there is a use case for a script code for "code points with no mappings".

Note: Unassigned-implicit weights are used for non-Hani code points without any mappings. For a given Unicode version they are Cn, Co, Cs.

I implemented the proposed behavior in my ICU "collv2" branch. Given disjoint lead bytes, this is trivial to implement.


Change History

comment:1 Changed 2 years ago by markus

  • Cc mark, pedberg, yoshito, emmons added

comment:2 Changed 2 years ago by srl

  • Owner changed from anybody to markus
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 25final

comment:3 Changed 2 years ago by markus

Also document for FractionalUCA.txt that it does not have explicit mappings for implicit weights, that the particular lead bytes for Hani vs. implicits vs. trailing is a holdover from some ICU version, and that implementations are free to move them (if they also move the weights for explicit "trailing" mappings, currently U+FFFD and U+FFFF).

comment:4 Changed 21 months ago by markus

  • Status changed from assigned to reviewing
  • Review set to mark

comment:5 Changed 21 months ago by mark

  • Status changed from reviewing to closed
  • Resolution set to fixed

Add a comment

Modify Ticket

as closed
The ticket will be disowned. The resolution will be deleted. Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.