[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10115(reviewing data)

Opened 5 months ago

Last modified 6 weeks ago

Run exemplars data through data check

Reported by: sascha Owned by: sascha
Component: main Data Locale: az,ms,apd,gba,kr,zlm
Phase: rc Review: mark
Weeks: Data Xpath:






Split off from cldrbug:9994 — these still need to be fixed:

Can only one have 1 instance of az_Arab in main, but have in [seed, exemplars]
Can only one have 1 instance of ms_Arab in main, but have in [seed, exemplars]
Missing simple parent (apd) for apd_Latn  in exemplars/main; likely=apd_Latn
Missing simple parent (gba) for gba_Latn  in exemplars/main; likely=null
Missing simple parent (kr) for kr_Arab  in exemplars/main; likely=kr_Arab
Missing simple parent (kr) for kr_Latn  in exemplars/main; likely=null
Missing simple parent (zlm) for zlm_Arab  in exemplars/main; likely=zlm_Arab
Missing simple parent (iu) for iu_Latn  in seed/casing; likely=iu_Latn


Change History

comment:1 Changed 2 months ago by mark

  • Owner changed from anybody to sascha
  • Phase changed from dsub to rc
  • Priority changed from assess to major
  • Status changed from new to accepted
  • Milestone changed from UNSCH to 32

comment:2 Changed 6 weeks ago by sascha

az_Arab: exemplars/main/az_Arab.xml contains one character that isn’t already in seed: U+06B4 ARABIC LETTER GAF WITH THREE DOTS ABOVE ڴ. When checking other sources for exemplars data (fontconfig, Wikipedia), I didn’t find any that mention U+06B4 to be used for Azeri. So I’m removing exemplars/main/az_Arab.xml without merging its content into seed.

comment:3 Changed 6 weeks ago by sascha

ms_Arab: exemplars/main/ms_Arab.xml lists U+06A9 ARABIC LETTER KEHEH ک which isn’t already in seed, and other sources [1, 2] confirm that this letter is used for writing the Malay language. In addition to U+06A9, exemplars/main/ms_Arab.xml also lists other of characters that aren’t in seed, but I couldn’t find confirmation for those. So I’m removing exemplars/main/ms_Arab.xml, while adding U+06A9 to the exemplar characters in seed.

[1] https://en.wikipedia.org/wiki/Jawi_alphabet
[2] http://www.omniglot.com/writing/malay.htm

comment:4 Changed 6 weeks ago by sascha

apd_Latn: Fixed by adding exemplars data for apd[-Arab]. According to Wikipedia, Sudanese Arabic seems to use the same letters as standard Arabic, so I copied those but marked the exemplars as unconfirmed.

comment:5 Changed 6 weeks ago by sascha

zlm_Arab: Fixed by adding exemplars data for simple parent zlm[-Latn] to exemplars/main. It seems to use the same characters as regular Malaysian, so I’ve taken the exemplars property from ms to come up with data for zlm.

comment:6 Changed 6 weeks ago by sascha

kr_Arab and kr_Latn: Fixed by making Kanuri (Latin) the default for Kanuri. While researching this, I found a better source for the current exemplars data for Kanuri (Latin), so I fixed the exemplars character data as well and added a link to a reference. Also filed ticket:10451 for adding likely subtags for Kanuri.

comment:7 Changed 6 weeks ago by sascha

iu_Latn: The file seed/casing/iu_Latn.xml had no content besides the file identity header. Removed it in change 13547, but this broke the CheckConsistentCasing check (outside unit tests). So I’ve added back the empty file, plus another equally empty file for seed/casing/iu.xml, in change 13549. I’m puzzled why the ‘casing’ test framework insists on seeing a content-free file for Inuktikut while not needing this for other languages; filed cldrbug:10455 in the hope that somebody more familiar with the ‘casing’ tests will have a look.

Last edited 6 weeks ago by sascha (previous) (diff)

comment:8 Changed 6 weeks ago by sascha

  • Xref changed from 9994 to 9994, 10451, 10452

gba_Latin: Fixed by renaming to gba. According to Ethnologue, Latin is the only script used for the language even though current CLDR metadata claims Arabic; filed ticket:10452 to get this fixed.

comment:9 Changed 6 weeks ago by sascha

  • Status changed from accepted to reviewing
  • Xref changed from 9994, 10451, 10452 to 9994, 10451, 10452, 10456
  • Review set to mark

Done for exemplars. There’s a probably unrelated bug which was introduced while the checks were disabled; filed cldrbug:10456 for that.


Add a comment

Modify Ticket

as reviewing

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.