[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #9799(accepted data)

Opened 21 months ago

Last modified 21 months ago

Language matching: rule for chr -> en seems incomplete

Reported by: rspeer@… Owned by: mark
Component: supplemental Data Locale:
Phase: rc Review:
Weeks: Data Xpath:


The language matching data (http://unicode.org/repos/cldr/tags/latest/common/supplemental/languageInfo.xml) has many fallback rules that indicate that a user who understands a given language (in its usual script) also may understand a different language (in its usual script, which is different). One example would be the matching of bn_Beng -> en_Latn, at a distance of 10.

There is a rule for matching chr -> en at a distance of 10, but unlike these other fallback rules, it does not include the difference in scripts.

If I understand the matching process (http://unicode.org/reports/tr35/#LanguageMatching) properly, this happens:

  • The tags 'chr' and 'en' are maximized to 'chr_Cher_US' and 'en_Latn_US'.
  • The region tags match exactly, so our match distance so far is 0. We go on to comparing 'chr_Cher' to 'en_Latn'.
  • The highest-priority rule that matches 'chr_Cher' to 'en_Latn' is the rule that matches '*_*' to '*_*' at a distance of 40. We add 40 to the match distance, and remove the final subtags, proceeding to compare 'chr' to 'en'.
  • 'chr' matches 'en' at a distance of 10, so we add 10 to the match distance. The final match distance is 50.

50 is a very large distance compared to most matches. It seems that a Cherokee user is presumed to find written English nearly indecipherable unless it is English written in the Cherokee alphabet (which would be very silly).

It appears to me that what's missing is a rule that matches 'en_Latn' to 'chr_Cher' at a distance of 10.


Change History

comment:1 Changed 21 months ago by pedberg

  • Owner changed from anybody to mark
  • Phase changed from dsub to rc
  • Status changed from new to accepted
  • Priority changed from assess to medium

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.