[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11447(accepted)

Opened 5 months ago

Last modified 3 months ago

Add aliases

Reported by: mark Owned by: mark
Component: locale-codes-names Data Locale:
Phase: dvet Review:
Weeks: Data Xpath:
Xref:

Description (last modified by mark) (diff)

  1. We don't have variantAliases for the following 3 values: heploc, arevela, arevmda. We do have uppercase variants for them, so we should lowercase those, and then document that the casing for all alias lookup be normalized case (lowercase except region and script).

We should generate also those automatically from the LSTR Preferred-Value so that these and future ones are added.

  1. We don't alias all language Retirements from ISO 639. In particular, we only do languages supported for display names (attributeValueValidity.xml.xml <variable id='$language' type='choice'>) UNION most-prominent encompassed-language to macrolanguage. We should go ahead and include all of the Retirements from ISO 639. That would add (currently) 74 languageAlias values.

Attachments

Change History

comment:1 Changed 5 months ago by mark

  • Description modified (diff)

comment:2 Changed 4 months ago by mark

  • Description modified (diff)

comment:3 Changed 4 months ago by mark

  • Owner changed from anybody to mark
  • Priority changed from assess to major
  • type changed from unknown to data
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 35

comment:4 Changed 4 months ago by mark

I did some further work on the locales, captured in

https://docs.google.com/spreadsheets/d/1U-0Kcr1G9Qm1BlzJackF_bJFHLiJcahQbedIjsiuUBY/edit?usp=sharing

  1. red are removals
    1. remove mappings whose addLikelySubtags are the same
    2. remove mappings where the territoryAlias has type="AAA"... (3 letters). These can't occur in valid BCP-47 language tags
  2. orange are changes
    1. The replacements having multiple items are unnecessarily complex to deal with, requiring use of likely subtags. These are pre-analysed so that the language field is used to disambiguate where possible.
    2. Where the type field has _ values, it is broken apart so that the main item is looked up, and other subtags are treated as context. So this adds attributes cLang="x" (meaning in context the language subtag must be "x"), cRegion (for the region/territory field), cVariant (a variant field). (No need for cScript as yet.)
    3. Where the replacement field has values that are not of the same type as the type, they are also broken out. The replacement field has the subtag left that does has the same time, while the others are broken out as rScript, rRegion, rVariant. (No need for rLang as yet.) The fields are to be added only if missing, but if the value is "", it indicates that the field should be removed.
    4. The suffix -notUbli is added to the reason if the type value cannot occur in a valid Unicode BCP-47 Locale Identifier

Examples:


<languageAlias type="aa" cVariant="SAAHO" replacement="ssy" rVariant="" reason="deprecated">

If the language is 'aa' and a variant is 'SAAHO', replace the language subtag by ssy and delete the variant SAAHO (but leave other variants).


<languageAlias type="cnr" replacement="sr" rRegion="ME" reason="legacy"/>

If the language is 'cnr' replace it by 'sr' and add the region ME if there is no region code already.


<territoryAlias type="SU" cLang="az" replacement="AZ" reason="deprecated">
<territoryAlias type="SU" cLang="be" replacement="BY" reason="deprecated">
...
<territoryAlias type="SU" replacement="RU" reason="deprecated">

if the region is SU, then if the language is 'az', replace the region by AZ; if the language is 'be' replace the region by BY,... else replace the region by RU


comment:5 Changed 4 months ago by mark

This does not yet add the Preferred Values / Retirements; have to decide what to do about the casing of variants also.

comment:6 Changed 4 months ago by mark

Some more about the reason for the changes. Currently the data is provided in a form that is not set up for processing, so you have to go through and build up tables to make accessing the data work cleanly, and so that modifying the result is also simple.

By changing the data somewhat it makes it easy to build an exception table for the items that have context. Then all other cases have the simple fast path.

comment:7 Changed 3 months ago by mark

  • Component changed from unknown to locale-codes

comment:8 Changed 3 months ago by mark

  • Phase changed from dsub to dvet
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.