CLDR Ticket #5866(closed defect: fixed)
LDML collation vs. U+0344 vs. overlap closure
Reported by: | markus | Owned by: | markus |
---|---|---|---|
Component: | xxx-spec | Data Locale: | |
Phase: | Review: | mark | |
Weeks: | 0.1 | Data Xpath: | |
Xref: |
Description
Follow-up to ticket:5667. I looked at Richard's example for U+0344 again and realized that he had omitted some contractions from the canonical closure (see my reply on the unicode list 2013apr02). When those are added, the canonically-closed mappings, including the overlap closure which adds contractions from overlaps of input contractions and decomposition mappings, will collate FCD input the same as NFD input. (FCD minus Tibetan composite vowels but including U+0344.) However, the overlap-closed mappings collate some NFD input differently than non-overlap-closed mappings.
I think we should remove U+0344 from the FCD exclusions where I added it a few weeks ago. Instead, we should document that
- An implementation (like ICU currently) which does not add the overlap contractions will get some different FCD/NFD results (which the ICU User Guide lists as a limitation).
- An implementation that does add the overlaps will get some different results for NFD than an implementation that doesn't add the overlaps.
We'd like you to show up and explain more of what's going on; what the implications are.