[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10148(reviewing data)

Opened 7 months ago

Last modified 8 days ago

Make distance between en_IN and en_GB be asymmetric

Reported by: mark Owned by: mark
Component: other Data Locale:
Phase: rc Review: pedberg
Weeks: Data Xpath:


We've gotten a report of a problem when the supported languages are {en_US, en_IN} and the desired language is en_GB. That the better outcome would be en_US than en_GB.

To fix this, make

  • the distance from en_GB to en_IN be larger than to en_US, but
  • the distance from en_IN to en_GB be smaller than to en_US

And look at similar cases.


Change History

comment:1 Changed 7 months ago by emmons

  • Status changed from new to accepted
  • Component changed from unknown to other
  • Priority changed from assess to minor
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 32
  • Owner changed from anybody to mark
  • Type changed from unknown to data

comment:2 Changed 7 months ago by emmons

  • Cc kristi, kiara added
  • Owner changed from mark to fredrik

comment:3 Changed 6 months ago by fredrik

  • Cc fredrik added
  • Owner changed from fredrik to mark

Our British linguist had to speculate a bit:
I guess the answer to your question would depend on how close en_IN is to en_GB and, having no experience of en_IN, I wouldn’t be able to answer that. On the one hand, I would assume that historically en_IN is closer to en_GB than to en_US, but maybe culturally these days that’s not so much the case. Equally, there may be Indian-specific usages in the software you mention that most Brits wouldn’t recognise.

The UK’s exposure to US English through film and TV, as well as the fact that some US software companies don't localise for the UK market, makes it more likely that a UK user using en_US software won’t encounter any terminology they’re not used to, even if they recognise it as US English. For this reason, I guess a British user would prefer to fall back on American English rather than Indian.

comment:4 Changed 6 months ago by kristi

Confirmed that En-US would be preferred as a fallback for En-GB.

comment:5 Changed 6 weeks ago by mark

  • Status changed from accepted to reviewing
  • Review set to kristi

comment:6 Changed 4 weeks ago by kristi

  • Review changed from kristi to pedberg

Peter, could you do the code review?

comment:7 Changed 8 days ago by Patrick Hensley <pathensley@…>

I've implemented enhanced language matching and am adding support for the en-GB asymmetric distance. Based on my understanding of the discussion, I've added the following match rules to my library (diff against rules circa CLDR v31.0.1):

<languageMatch desired="en_*_$!enUS" supported="en_*_GB" distance="3" oneway="true" /> <!-- prefer en-GB over other non-enUS -->
<languageMatch desired="en_*_GB" supported="en_*_US" distance="3" oneway="true" /> <!-- preferred fallback for en-GB -->

The resulting pairwise distances from my test cases are:

supported   desired   distance
---------   -------   --------
 en-US       en-GB     3
 en-US       en-VI     4
 en-US       en-PR     4
 en-US       en-IN     5
 en-US       en-ZA     5

 en-GB       en-IN     3
 en-GB       en-ZA     3
 en-GB       en-US     5
 en-GB       en-VI     5
 en-GB       en-PR     5

 en-IN       en-GB     4
 en-IN       en-ZA     4
 en-IN       en-US     5
 en-IN       en-VI     5

 en-VI       en-US     4
 en-VI       en-PR     4
 en-VI       en-GB     5
 en-VI       en-IN     5

I'm curious if these rules correctly address this issue, or if there is a better way to express them?

Distance table: https://github.com/Squarespace/cldr/blob/master/notes/language-distance-table.txt#L55


Add a comment

Modify Ticket

as reviewing

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.