[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10154(closed survey: fixed)

Opened 14 months ago

Last modified 12 months ago

Dashboard: suppress warning about non-exemplar characters for cities

Reported by: fredrik Owned by: emmons
Component: survey Data Locale:
Phase: dsub Review: fredrik
Weeks: Data Xpath:


From the CLDR prep meeting on 3/28/17, it was noted that a common warning in Dashboard are translation of cities containing characters not included in the exemplar characters for that locale.

Often this is a false positive. Many locales tend to adopt characters from the original spelling of the city, without for that sake wanting to include it in exemplar characters. We should consider suppressing the warning as long as the script matches the target locale.


Change History

comment:1 Changed 13 months ago by emmons

  • Owner changed from anybody to emmons
  • Priority changed from assess to minor
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 32

comment:2 Changed 12 months ago by mark

The solution in the first commit is incorrect. The ticket requests "We should consider suppressing the warning as long as the script matches the target locale." But the fix disables ALL checks. So I was able to enter in the German locale the name of "Bermuda" as "ユニコード", which we clearly don't want.

The right fix is to revert the first commit. Then add a new set like exemplarsPlusAscii, called exemplarsFullScript. When checking the timezone names, use this new set.

The new set can be produced from exemplars by something like the following:

        exemplarsFullScript = new UnicodeSet();
        for (String s : exemplars) {
          int script = UScript.getScript(s.codePointAt(0));
          if (script != UScript.COMMON && script != UScript.INHERITED) {
              // whenever we find the first explicit script, we add all other characters from that script and bail.
              exemplarsFullScript.applyIntPropertyValue(UProperty.SCRIPT, script);

comment:3 Changed 12 months ago by mark


  1. goto http://cldr-smoke.unicode.org/smoketest/v#/de/SAmerica/2a67fbb622c4d504
  2. enter in Ԡ (bizarre Cyrillic character)
  3. should get warning "[warn]The characters ‎[Ԡ]‎ {Cyrillic} are not in the exemplar characters. For what to do, see Handling Warnings in Characters."


  1. http://cldr-smoke.unicode.org/smoketest/v#/de/SAmerica/3783dfba84f0bcd5
  2. enter in Ⅎ (bizarre Latin character)
  3. should get NO warning but gets "[warn]The characters ‎[Ⅎ]‎ {Latin} are not in the exemplar characters. For what to do, see Handling Warnings in Characters."

comment:4 Changed 12 months ago by mark

Also, code is very inefficient, since it does a lot of work every single instance. Instead:

  • Add new fields called exemplarsFullScript, exemplarsFullScriptPlusAscii.
  • Initialize them where exemplars, exemplarsPlusAscii are initialized.
  • Where testing for timezone names, use
    • if (null != (disallowed = containsAllCountingParens(exemplarsFullScript, exemplarsFullScriptPlusAscii, value))) {
    • instead of
    • if (null != (disallowed = containsAllCountingParens(exemplars, exemplarsPlusAscii, value))) {

comment:5 Changed 12 months ago by emmons

For the one that you claim to be a failure, you're testing in the wrong section. These changes only apply to exemplar CITIES, not metazone names, which is where the link was.

Also, you claim the code is inefficient, but I don't really think so, since it's only checking scripts when you have characters that aren't in the exemplars, which isn't really the case.

In any event, if you're still not happy with this, I would suggest you take it yourself. I have more pressing matters to work on.

comment:6 Changed 12 months ago by emmons

  • Status changed from accepted to reviewing
  • Review set to fredrik

comment:7 Changed 12 months ago by fredrik

  • Status changed from reviewing to closed
  • Resolution set to fixed

Add a comment

Modify Ticket

as closed
Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.