Description of Issue:
Draft updated 2013-03-18. Some of the 3.x sections of UTS #10 were reordered for better flow. See the
Modifications for details.
The Unicode Collation Algorithm has been modified for certain edge cases, in
particular:
- The Variable-Weighting option IgnoreSP has been removed because it did
not prove to be useful.
- The highest primary weights FFFD..FFFF have been reserved for special
collation elements.
- The data files for the CLDR root collation will be published via CLDR
releases rather than as a set of files (in CollationAuxiliary.zip) in the
UCA data directory.
See UTS #35: LDML: Collation, Section 2.1,
Root Collation Data Files.
- For more details, see
Modifications. In addition, there are a number of minor changes for
clarity and consistency of the text.
The DUCET has also been modified, in particular:
- All digits map to collation elements with the default secondary weight
(0020 in the DUCET, "<BASE>" in the ISO 14651 CTT) rather than
script-specific secondary weights for non-ASCII digits.
- In expansions, trailing collation elements use regular tertiary weights
rather than MAX = 1F. The MAX tertiary weight is not used any more in the
DUCET.
- Fourth-level weights have been removed from the DUCET data file. (These
have not been used by the UCA.)
- See the draft DUCET for UCA 6.3 here:
http://www.unicode.org/Public/UCA/6.3.0/
For information about how to discuss this issue and how to supply
formal feedback, please see the
feedback and discussion
instructions. The accumulated feedback received so far on this issue is shown below,
or you can look at a full page view.