CLDR Ticket #6816(closed enhancement: fixed)
document script reordering vs. unassigned-implicits
|Reported by:||markus||Owned by:||markus|
LDML Section 3.12 Collation Reordering says "The IMPLICIT group is currently treated as if it were part of Hani."
This was due to a limitation of the hardcoded implicit-weights algorithm in ICU, where some Han-implicit weights shared a primary lead byte with some unassigned-implicit weights.
This is an unnecessary limitation. It is easy to use disjoint lead bytes for Han vs. unassigned implicit weights.
I propose that we change this statement as follows:
- Han-implicit weights reorder with the Hani script.
- Unassigned-implicit weights reorder as the last weights in the "others" (Zzzz) group.
I believe that this is the most understandable behavior. For example, reorder="Zzzz Grek" sorts ... Hani, unassigned, Greek, TRAILING.
Note that there is no script code to explicitly reorder the unassigned-implicit weights into a particular position; one would have to create a reordering script code list that explicitly includes a script of each normal group, and then Zzzz would stand for the remaining unassigned-implicit weights. I don't think there is a use case for a script code for "code points with no mappings".
Note: Unassigned-implicit weights are used for non-Hani code points without any mappings. For a given Unicode version they are Cn, Co, Cs.
I implemented the proposed behavior in my ICU "collv2" branch. Given disjoint lead bytes, this is trivial to implement.
- Owner changed from anybody to markus
- Status changed from new to assigned
- Milestone changed from UNSCH to 25final
- Status changed from assigned to reviewing
- Review set to mark