CLDR Ticket #5866(closed defect: fixed)
LDML collation vs. U+0344 vs. overlap closure
|Reported by:||markus||Owned by:||markus|
Follow-up to ticket:5667. I looked at Richard's example for U+0344 again and realized that he had omitted some contractions from the canonical closure (see my reply on the unicode list 2013apr02). When those are added, the canonically-closed mappings, including the overlap closure which adds contractions from overlaps of input contractions and decomposition mappings, will collate FCD input the same as NFD input. (FCD minus Tibetan composite vowels but including U+0344.) However, the overlap-closed mappings collate some NFD input differently than non-overlap-closed mappings.
I think we should remove U+0344 from the FCD exclusions where I added it a few weeks ago. Instead, we should document that
- An implementation (like ICU currently) which does not add the overlap contractions will get some different FCD/NFD results (which the ICU User Guide lists as a limitation).
- An implementation that does add the overlaps will get some different results for NFD than an implementation that doesn't add the overlaps.
- Owner changed from anybody to markus
- Priority changed from assess to medium
- Status changed from new to assigned
- Milestone changed from UNSCH to 24final
- Status changed from assigned to reviewing
- Review set to mark