Make distance between en_IN and en_GB be asymmetric

We've gotten a report of a problem when the supported languages are {en_US, en_IN} and the desired language is en_GB. That the better outcome would be en_US than en_GB.

To fix this, make

  • the distance from en_GB to en_IN be larger than to en_US, but
  • the distance from en_IN to en_GB be smaller than to en_US

And look at similar cases.


Our British linguist had to speculate a bit:
I guess the answer to your question would depend on how close en_IN is to en_GB and, having no experience of en_IN, I wouldn’t be able to answer that. On the one hand, I would assume that historically en_IN is closer to en_GB than to en_US, but maybe culturally these days that’s not so much the case. Equally, there may be Indian-specific usages in the software you mention that most Brits wouldn’t recognise.

The UK’s exposure to US English through film and TV, as well as the fact that some US software companies don't localise for the UK market, makes it more likely that a UK user using en_US software won’t encounter any terminology they’re not used to, even if they recognise it as US English. For this reason, I guess a British user would prefer to fall back on American English rather than Indian.

Confirmed that En-US would be preferred as a fallback for En-GB.

I've implemented enhanced language matching and am adding support for the en-GB asymmetric distance. Based on my understanding of the discussion, I've added the following match rules to my library (diff against rules circa CLDR v31.0.1):

<languageMatch desired="en_*_$!enUS" supported="en_*_GB" distance="3" oneway="true" /> <!-- prefer en-GB over other non-enUS -->
<languageMatch desired="en_*_GB" supported="en_*_US" distance="3" oneway="true" /> <!-- preferred fallback for en-GB -->

The resulting pairwise distances from my test cases are:

supported   desired   distance
---------   -------   --------
 en-US       en-GB     3
 en-US       en-VI     4
 en-US       en-PR     4
 en-US       en-IN     5
 en-US       en-ZA     5

 en-GB       en-IN     3
 en-GB       en-ZA     3
 en-GB       en-US     5
 en-GB       en-VI     5
 en-GB       en-PR     5

 en-IN       en-GB     4
 en-IN       en-ZA     4
 en-IN       en-US     5
 en-IN       en-VI     5

 en-VI       en-US     4
 en-VI       en-PR     4
 en-VI       en-GB     5
 en-VI       en-IN     5

I'm curious if these rules correctly address this issue, or if there is a better way to express them?

Distance table: https://github.com/Squarespace/cldr/blob/master/notes/language-distance-table.txt#L55


