[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #7133(accepted unittest)

Opened 3 years ago

Last modified 21 months ago

Problems with languageInfo test

Reported by: emmons Owned by: mark
Component: supplemental Data Locale:
Phase: rc Review:
Weeks: Data Xpath:
Xref:

Description

The following errors still exist in the unit test LanguageInfoTest/testFallbacks, even after updating to the latest ICU4J, which should be using the proper fallback data from CLDR. Needs more investigation/fix:

LanguageInfoTest {
  TestChinese (5.222s) Passed
  testBasics (0.007s) Passed
  testFallbacks {
    Error: File LanguageInfoTest.java, Line 90: ab => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ach => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: af => nl: expected com.ibm.icu.util.ULocale<nl>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ak => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: am => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ay => es: expected com.ibm.icu.util.ULocale<es>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: az => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: be => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: bem => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: bh => hi: expected com.ibm.icu.util.ULocale<hi>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: bn => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: br => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ceb => fil: expected com.ibm.icu.util.ULocale<fil>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: chr => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ckb => ar: expected com.ibm.icu.util.ULocale<ar>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: co => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: crs => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: cy => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ee => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: eo => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: es_MX => es_419: expected com.ibm.icu.util.ULocale<es_419>, got com.ibm.icu.util.ULocale<es>
    Error: File LanguageInfoTest.java, Line 90: et => fi: expected com.ibm.icu.util.ULocale<fi>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: eu => es: expected com.ibm.icu.util.ULocale<es>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: fo => da: expected com.ibm.icu.util.ULocale<da>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: fy => nl: expected com.ibm.icu.util.ULocale<nl>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ga => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: gaa => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: gd => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: gl => es: expected com.ibm.icu.util.ULocale<es>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: gn => es: expected com.ibm.icu.util.ULocale<es>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: gu => hi: expected com.ibm.icu.util.ULocale<hi>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ha => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: haw => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ht => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: hy => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ia => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ig => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: is => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: jv => id: expected com.ibm.icu.util.ULocale<id>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ka => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: kg => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: kk => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: km => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: kn => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: kri => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ku => tr: expected com.ibm.icu.util.ULocale<tr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ky => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: la => it: expected com.ibm.icu.util.ULocale<it>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: lg => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ln => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: lo => th: expected com.ibm.icu.util.ULocale<th>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: loz => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: lua => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: mfe => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: mg => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: mi => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: mk => bg: expected com.ibm.icu.util.ULocale<bg>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ml => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: mn => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: mr => hi: expected com.ibm.icu.util.ULocale<hi>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: mt => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: my => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ne => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: nso => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ny => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: nyn => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: oc => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: om => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: or => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: pa => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: pcm => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ps => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: qu => es: expected com.ibm.icu.util.ULocale<es>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: rm => de: expected com.ibm.icu.util.ULocale<de>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: rn => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: rw => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: sa => hi: expected com.ibm.icu.util.ULocale<hi>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: sd => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: si => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: sn => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: so => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: sq => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: st => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: su => id: expected com.ibm.icu.util.ULocale<id>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: sw => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ta => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: te => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: tg => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ti => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: tk => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: tlh => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: tn => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: to => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: tt => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: tum => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ug => zh: expected com.ibm.icu.util.ULocale<zh>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: ur => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: uz => ru: expected com.ibm.icu.util.ULocale<ru>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: wo => fr: expected com.ibm.icu.util.ULocale<fr>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: xh => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: yi => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: yo => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
    Error: File LanguageInfoTest.java, Line 90: zu => en: expected com.ibm.icu.util.ULocale<en>, got com.ibm.icu.util.ULocale<mul>
  } (0.056s) FAILED (103 failures)
} (5.288s) FAILED (103 failures)

Error summary:
LanguageInfoTest/testFallbacks


<< 103 TEST(S) FAILED >>

Will log this ticket as a known issue in the test.

Attachments

Change History

comment:1 Changed 3 years ago by emmons

  • Owner changed from anybody to mark
  • Priority changed from assess to medium
  • Status changed from new to assigned
  • Component changed from unknown to test
  • Milestone changed from UNSCH to 26rc

comment:2 Changed 3 years ago by tomzhang

I traced this problem, and got some tedious analysis on the problem. The problem occurs for both CLDR/ICU code, and mainly ICU$LocaleMatcher was mainly distributed by Mark, so I left all my findings here:

In the test, we compare 2 locales/maximized locales with respect to languages/regions/scripts. And we check result with threshold, simplified codes like this:

// The following logic is ICU side, LocaleMather
diff = 0;
diff += lang_diff + region_diff + script_diff;
check = max(1- diff, 0);
if (check < threshold){
  return default_locale(which is “mul” in tests);    
}
// percent is specified in xml, and I assume it means “how close these 
// 2 locales are related”. e.g 99 is closer than 98
// The following is those values, and this is set in CLDR unit test 
// (data from languageInfo.xml)

lang_diff = 1-percent/100 = 0.9 // where percent is specified in xml, & percent = 10 is used for tests
// these are default values: 
region_diff = 1 – 96/100 = 0.04 // last line
script_diff = 1 – 20/100 = 0.8 // 2nd last line, which I guess is wrong. May be 0.2 according LocalMatcher.java$enum level in ICU
threadhold = 0.5;

So, as we map locales to a different locale, lang_diff (0.9) is always added, and so when script_diff is added, diff = 1.7 → check = 0 < threshold → fallback to default locale → failing the test.

My suggestion is we change “percent” from 10 to 30 for general mappings, from 20 to 80 for default scripts mapping, like this:

- <languageMatch desired="yi" supported="en" percent="10" oneway="true" />
- <languageMatch desired="*_*" supported="*_*" percent="20" /> <!-- [Default value - must be at end!] Normally there is little comprehension of different scripts. -->

+ <languageMatch desired="yi" supported="en" percent="30" oneway="true" />
+ <languageMatch desired="*_*" supported="*_*" percent="80" /> <!-- [Default value - must be at end!] Normally there is little comprehension of different scripts. -->

This solves most failures. For this one, I think ICU was fine, but either our data was not alright or the way we initialized it was not right.... Fixing one of them will make it work.


In this way it passes most tests. Except one:
<languageMatch desired="es_MX" supported="es_419" percent="15" oneway="true"/>

The simple reason is order matters, if we move the above line from bottom to top(before the previous line), it works:
<languageMatch desired="es_*_419" supported="es_*_*" percent="99" /> <!-- Make 419 a bit closer to each one than they are to one another. -->

Here is more details: in ICU, when region_diff is checked, once it finds a match (in this case es_*_* ), it directly returns. So “es” was treated the same as “ es_419” although “es_419” actually has small values specified later. Thus, moving the line above will pass the test.
Note: fortunately we have “es” before “es_419”, otherwise “es_419” will be used for most tests.

Personally I say it is more ICU-oriented problems. But maybe it was designed that way or documented???? I mean, if we move “higer percent” all to the front, we can solve this problem. But this does not seem right for me....

comment:3 Changed 3 years ago by mark

  • Milestone changed from 26rc to 27dsub

Pushing to next release.

comment:4 Changed 3 years ago by markus

  • Phase set to dsub
  • Milestone changed from 27dsub to 27

comment:5 Changed 2 years ago by mark

  • Milestone changed from 27 to 28

comment:6 Changed 2 years ago by mark

  • Phase changed from dsub to rc

comment:7 Changed 2 years ago by markus

  • Type changed from defect to unittest
  • Component changed from test to unknown

comment:8 Changed 2 years ago by srl

  • Status changed from assigned to accepted

comment:9 Changed 2 years ago by emmons

  • Component changed from unknown to supplemental

comment:10 Changed 22 months ago by mark

  • Milestone changed from 28 to 29

comment:11 Changed 21 months ago by emmons

  • Milestone changed from 29 to upcoming

Auto move of all 29 -> upcoming

View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.