[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #9924(closed data: fixed)

Opened 4 months ago

Last modified 4 months ago

Still wrong: Population data for Cantonese (yue) speakers

Reported by: mark Owned by: mark
Component: supplemental Data Locale:
Phase: dsub Review: pedberg
Weeks: Data Xpath:


The figures are still wrong. Ticket http://unicode.org/cldr/trac/ticket/9356 was not really fixed.

We have the following in the v30 data file, which is still wrong.

<languagePopulation type="yue" writingPercent="4.3" populationPercent="90" references="R1317"/> <!--Cantonese-->

We need to analyze what happened and why.

Also, the reference is not very authoritative:

<reference type="R1317">about 50% of population in Guangzhou Prov</reference>


Change History

comment:1 Changed 4 months ago by mark

It looks like the spreadsheet version is ok, but the real data file is not.

comment:2 Changed 4 months ago by mark

I reran ConvertLanguageData, and it looks like it was not run (or at least, that the results were not put into CLDR).

A12 Update supplemental data language/country/population from the spreadsheet, and do ConvertLanguageData. https://sites.google.com/site/cldr/development/updating-codes/update-language-script-info

Not sure why this was missed. However,

  1. we should fix the dev charts and trunk data asap.
  2. I think this is at least deserving of a Known Issue
  3. It may be worth a dot release (or at least inclusion into a dot release if we do another).

comment:3 Changed 4 months ago by mark

Agreed to do, and include in dot release.

comment:4 Changed 4 months ago by mark

  • Owner changed from anybody to mark
  • Priority changed from assess to major
  • Type changed from unknown to data
  • Status changed from new to design
  • Milestone changed from UNSCH to 30.0.3

comment:5 Changed 4 months ago by mark

Regenerated the data, added unit test for yue.

Last release, after we'd generated the data, I'd updated the generation code to canonicalize the locale IDs. That needed a fix to the tests, and verification that downstream code still worked ok. So when I ran the code I found the following changes:

  1. We apparently didn't run after updating GDP data, since that changed also. I saw changes like the following, which are ok. <territory type="AC" gdp="37680000" literacyPercent="99" population="940"> <!--Ascension Island--> => <territory type="AC" gdp="39290000" literacyPercent="99" population="940"> <!--Ascension Island-->
  2. The canonicalize change is reflected, so I saw changes like the following. Those will be fine once the downstream changes are made (see below). <languagePopulation type="tk_Latn" populationPercent="1.7" officialStatus="official_regional"/> <!--Turkmen (Latin)--> => <languagePopulation type="tk" populationPercent="1.7" officialStatus="official_regional"/> <!--Turkmen-->
  3. It picks up a change that was just made in ccp, and one in de_IT. Both look fine.
  4. When I run http://cldr.unicode.org/development/updating-codes/likelysubtags, the likely subtags look ok (though there are a lot more of them).
  5. But the defaultContent are not ok, and would require work.

Therefore, for the dot-dot release, I recommend just changing one line, which is what I did.

<languagePopulation type="yue" writingPercent="4.3" populationPercent="90" references="R1317"/> <!--Cantonese-->
<languagePopulation type="yue" populationPercent="5.2" references="R1317"/> <!--Cantonese-->

I'll file a separate bug for completing the work in 31. I did leave in a flag that can be used to fix the tests: TestSupplementalInfo.LOCALES_FIXED

comment:6 Changed 4 months ago by mark

Ticket for the rest of the work is ticket:9928

comment:7 Changed 4 months ago by mark

  • Status changed from design to accepted
  • Review set to pedberg

comment:8 Changed 4 months ago by mark

  • Status changed from accepted to reviewing

comment:9 Changed 4 months ago by pedberg

  • Status changed from reviewing to closed
  • Resolution set to fixed

comment:10 Changed 4 months ago by pedberg

  • Component changed from unknown to supplemental

Add a comment

Modify Ticket

as closed
Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.