[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #6739(closed task: fixed)

Opened 2 years ago

Last modified 19 months ago

specify collation type fallback

Reported by: markus Owned by: markus
Component: xxx-spec Data Locale:
Phase: Review: pedberg
Weeks: 0.1 Data Xpath:


I found that instantiating an ICU collator simply falls back to the root collator when the requested collation type is not available, which seems bad. I believe that CLDR does not specify the fallback, and I think it should.

For ICU, we just agreed to fall back from "search.+" to "search", from any missing type to the default type, then to "standard", finally to the root collator. (IcuBug:10469)

For example, assume that we have collation data for the following tailorings. ("da/search" is shorthand for "da-u-co-search".)

  • root/defaultCollation=standard
  • root/standard (this is the same as "the CLDR root collator")
  • root/search
  • da/standard
  • da/search
  • el/standard
  • ko/standard
  • ko/search
  • ko/searchjl
  • zh/defaultCollation=pinyin
  • zh/pinyin
  • zh/stroke
  • zh-Hant/defaultCollation=stroke

Sample requested & actual collation locales & types:

requested actual comment
da/phonebook da/standard default type for Danish
zh zh/pinyin default type for zh
zh/standard root/standard no "standard" tailoring for zh, falls back to root
zh/phonebook zh/pinyin default type for zh
zh-Hant/phonebook zh/stroke default type for zh-Hant is "stroke"
da/searchjl da/search "search.+" falls back to "search"
el/search root/search no "search" tailoring for Greek
el/searchjl root/search "search.+" falls back to "search", found in root
ko/searchjl ko/searchjl requested data is actually available

Fallback mechanism:

  • Load the resource bundle for the locale, with the usual fallback (might go to the default locale, eventually to root). (Need to rephrase in terms of CLDR.)
    • All of the following lookups use item fallbacks along the bundle/parent bundle chain of loaded resource bundles.
  • If the input locale ID does not have a collation keyword, or has -u-co-default (is this valid for -u-co?), then fetch the type from the <defaultCollation> element.
    • If there is no <defaultCollation>, then fall back to type="standard". (This should not happen with CLDR data: collation/root.xml should have the <defaultCollation> element, with the "standard" value.)
  • Get the <collation type="(type from keyword)"> data.
  • If missing, and the type starts with "search" but is longer, then try type="search".
  • If missing, and the type is not the default type, then try the default type.
  • If missing, then try type="standard".
  • Ultimately fall back to the root collator.


Change History

comment:1 Changed 2 years ago by emmons

  • Owner changed from anybody to markus
  • Priority changed from assess to medium
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 25final

comment:2 Changed 19 months ago by markus

  • Status changed from assigned to reviewing
  • Review set to pedberg

comment:3 Changed 19 months ago by pedberg

  • Status changed from reviewing to closed
  • Resolution set to fixed

Add a comment

Modify Ticket

as closed
The ticket will be disowned. The resolution will be deleted. Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.