CLDR Ticket #6850(closed enhancement: fixed)
collation algorithm: fall back to shorter prefixes
|Reported by:||markus||Owned by:||markus|
Description (last modified by pedberg) (diff)
LDML 24 says to fall back from mappings with the longest matching prefix directly to mappings with no prefix.
Richard Wordingham pointed out that that would yield different results for NFD input vs. composite characters:
Consider just having two extra mappings, for op|č and p|ç. Then we have CE(opç) = CE(o)CE(p)CE(c)CE(\0327), as 'c' has prefixes 'op' and 'p', and 'op' is a matching prefix. However, if one looks for mappings starting with the character 'ç', the only prefix one sees is 'p', and one would incorrectly derive CE(opç) = CE(o)CE(p)CE(p|ç).
Mark also pointed out that adding a mapping with a longer prefix could hide mappings with shorter prefixes, which would be counter-intuitive.
We should modify the fallback to go from mappings with the longest matching prefix to mappings with the next-longest prefix, and so on, ultimately to mappings with no prefix.
I have implemented this in my ICU "collv2" branch (ICU r34698).
- Owner changed from anybody to markus
- Status changed from new to assigned
- Milestone changed from UNSCH to 25final
- Status changed from assigned to reviewing
- Review set to pedberg