CLDR Ticket #9799(accepted data)
Language matching: rule for chr -> en seems incomplete
|Reported by:||rspeer@…||Owned by:||mark|
The language matching data (http://unicode.org/repos/cldr/tags/latest/common/supplemental/languageInfo.xml) has many fallback rules that indicate that a user who understands a given language (in its usual script) also may understand a different language (in its usual script, which is different). One example would be the matching of bn_Beng -> en_Latn, at a distance of 10.
There is a rule for matching chr -> en at a distance of 10, but unlike these other fallback rules, it does not include the difference in scripts.
If I understand the matching process (http://unicode.org/reports/tr35/#LanguageMatching) properly, this happens:
- The tags 'chr' and 'en' are maximized to 'chr_Cher_US' and 'en_Latn_US'.
- The region tags match exactly, so our match distance so far is 0. We go on to comparing 'chr_Cher' to 'en_Latn'.
- The highest-priority rule that matches 'chr_Cher' to 'en_Latn' is the rule that matches '*_*' to '*_*' at a distance of 40. We add 40 to the match distance, and remove the final subtags, proceeding to compare 'chr' to 'en'.
- 'chr' matches 'en' at a distance of 10, so we add 10 to the match distance. The final match distance is 50.
50 is a very large distance compared to most matches. It seems that a Cherokee user is presumed to find written English nearly indecipherable unless it is English written in the Cherokee alphabet (which would be very silly).
It appears to me that what's missing is a rule that matches 'en_Latn' to 'chr_Cher' at a distance of 10.