[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #8407(accepted data)

Opened 3 years ago

Last modified 8 months ago

Improve readability and maintainability of coverageLevels.xml

Reported by: mark Owned by: mark
Component: supplemental Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


  1. Whenever we see long lists of items, it is hard to know when they are exactly the same, except for minute inspection. Much better for maintenance to use variables. In approvalRequirement, for example, replace strings like the following by the use of variables.

"ar ca cs da de el es fi fr he hi hr hu it ja ko nb nl pl pt pt_PT ro ru sk sl sr sv th tr uk vi zh zh_Hant"

  1. Regular expressions that are just purely lists of items should be expressed as lists, rather than optimized. They are otherwise quite difficult to parse, review, and change (without error). See:


If we need to optimize them, we should have a separate internal method to do that. One I'd recommend having an alternate attribute like the following that takes a space delimited list.

list="length-picometer list-light-year..."

When the variable is encountered, the list can internally be optimized as a regex (if necessary). Such optimization can do a much better job than hand-optimization.

  1. The lists can be more self-documenting if we introduce some pre-set variables. For example, the list of all cldr organization languages can be fetched instead of written in a variable (that can fall out of date).

I suggest %% syntax for those. Example:


internally gets set at startup to:
StandardCodes.make().getLocaleCoverageLocales(Organization.cldr.name(), EnumSet.of(Level.MODERN));

We can do this for many cases. For example:

To get the non-private use, non-deprecated region codes.

That would let us replace many of the variables used for modern coverage by a full, always-up-todate list, such as replacing

<coverageVariable key="%script100" value="(Afak|Aghb|Ahom|Armi|Avst|Bali|Bamu|Bass|Batk|Blis|Brah|Bugi|Buhd|Cakm|Cans|Cari|Cham|Cher|Cirt|Copt|Cprt|Cyrs|Dsrt|Dupl|Egy[dhp]|Elba|Geok|Glag|Goth|Gran|Hatr|Hano|Hluw|Hmng|Hrkt|Hung|Inds|Ital|Java|Jurc|Kali|Khar|Khoj|Kpel|Kthi|Lana|Lat[fg]|Lepc|Limb|Lin[ab]|Lisu|Loma|Ly[cd]i|Mahj|Man[di]|Maya|Mend|Mer[co]|Modi|Moon|Mroo|Mtei|Mult|Narb|Nbat|Nkgb|Nkoo|Nshu|Ogam|Olck|Orkh|Osma|Palm|Pauc|Perm|Phag|Phl[ipv]|Phnx|Plrd|Prti|Rjng|Roro|Runr|Samr|Sar[ab]|Saur|Sgnw|Shaw|Shrd|Sidd|Sind|Sora|Sund|Sylo|Syr[cejn]|Tagb|Takr|Tal[eu]|Tang|Tavt|Teng|Tfng|Tglg|Tirh|Ugar|Vaii|Visp|Wara|Wole|Xpeo|Xsux|Yiii|Zinh|Zmth)"/>


<coverageVariable key="%script100" value="%%scripts""/>

  1. This is probably for later on, but for many attributes, we know exactly from the DtdData or supplementalMetadata.xml what the possible values are. So we can populate variables with those values. So we could automatically set variables like:


instead of manually setting

<coverageVariable key="%dayTypes" value="(sun|mon|tue|wed|thu|fri|sat)"/>


Change History

comment:1 Changed 3 years ago by emmons

  • Status changed from new to accepted
  • Component changed from unknown to supplemental
  • Priority changed from assess to medium
  • Milestone changed from UNSCH to 29
  • Owner changed from anybody to mark
  • Type changed from unknown to data

comment:2 Changed 3 years ago by emmons

  • Milestone changed from 29 to upcoming

comment:3 Changed 8 months ago by mark

Note: the current formulation for the coverage variables is also not optimal:







The optimized expression group as much as possible, while having no backup and just going forwards in the alternations


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.