CLDR Ticket #5081(closed enhancement: fixed)
change FractionalUCA to use new syntax to refer to weights
|Reported by:||markus||Owned by:||markus|
For UCA 6.2.1 probably --
UCA 6.2 CollationAuxiliary.html adds a section detailing the "Implicit Fractional Weight Generation"
As mentioned in the new section: “It is important to match the exact algorithm so that the weights for compatibility decomposable characters match.”
This is problematic. The current implicit-weight algorithm requires changes every time a CJK ideograph is added to Unicode, which requires that the code is precisely synchronized with the FractionalUCA version, and hardcoding the results of the algorithm in the FractionalUCA file makes it hard to use a different implicit-weight algorithm.
It would be much better to use simple syntax to refer to weights or CEs of characters.
- Change FractionalUCA to use new syntax to refer to weights. In particular, change the FractionalUCA data to use this new syntax for all weights and/or CEs for characters whose decompositions contain Unified Ideographs, so that the implicit weights do not appear in the file.
- Remove the Implicit Fractional Weight Generation text from CollationAuxiliary.html.
- In CollationAuxiliary.html, describe the new syntax instead.
- [U+4E00] copies the entire CE of U+4E00.
- [U+4E00, 09] copies the primary & secondary weights and sets the tertiary weight to 09.
- [U+4E00, 05, 09] copies the primary weight and sets the secondary & tertiary weights to 05/09.
Only a single code point must be referred to, and it must have a single CE.
This would change CJK COMPATIBILITY IDEOGRAPH-F967
F967; [E0 04 20, 05, 05]
and PARENTHESIZED IDEOGRAPH ONE
3220; [0A B5, 05, 09][E0 04 06, 05, 09][0A B7, 05, 3D]
3220; [0A B5, 05, 09][U+4E00, 09][0A B7, 05, 3D]
It would also allow making other entries more readable, for example changing
LATIN SMALL LETTER A WITH DIAERESIS
00E4; [27, 05, 05][, 9D, 05]
but this has low priority.
- Owner changed from anybody to markus
- Priority changed from assess to medium
- Status changed from new to assigned
- Cc mark, yoshito, pedberg, emmons added
- Status changed from assigned to accepted
- Review set to emmons