Still wrong: Population data for Cantonese (yue) speakers

The figures are still wrong. Ticket http://unicode.org/cldr/trac/ticket/9356 was not really fixed.

We have the following in the v30 data file, which is still wrong.

<languagePopulation type="yue" writingPercent="4.3" populationPercent="90" references="R1317"/> <!--Cantonese-->

We need to analyze what happened and why.

Also, the reference is not very authoritative:

<reference type="R1317">about 50% of population in Guangzhou Prov</reference>


Change History

It looks like the spreadsheet version is ok, but the real data file is not.

I reran ConvertLanguageData, and it looks like it was not run (or at least, that the results were not put into CLDR).

A12 Update supplemental data language/country/population from the spreadsheet, and do ConvertLanguageData. https://sites.google.com/site/cldr/development/updating-codes/update-language-script-info

Not sure why this was missed. However,

  1. we should fix the dev charts and trunk data asap.
  2. I think this is at least deserving of a Known Issue
  3. It may be worth a dot release (or at least inclusion into a dot release if we do another).

Agreed to do, and include in dot release.

Regenerated the data, added unit test for yue.

Last release, after we'd generated the data, I'd updated the generation code to canonicalize the locale IDs. That needed a fix to the tests, and verification that downstream code still worked ok. So when I ran the code I found the following changes:

  1. We apparently didn't run after updating GDP data, since that changed also. I saw changes like the following, which are ok. <territory type="AC" gdp="37680000" literacyPercent="99" population="940"> <!--Ascension Island--> => <territory type="AC" gdp="39290000" literacyPercent="99" population="940"> <!--Ascension Island-->
  2. The canonicalize change is reflected, so I saw changes like the following. Those will be fine once the downstream changes are made (see below). <languagePopulation type="tk_Latn" populationPercent="1.7" officialStatus="official_regional"/> <!--Turkmen (Latin)--> => <languagePopulation type="tk" populationPercent="1.7" officialStatus="official_regional"/> <!--Turkmen-->
  3. It picks up a change that was just made in ccp, and one in de_IT. Both look fine.
  4. When I run http://cldr.unicode.org/development/updating-codes/likelysubtags, the likely subtags look ok (though there are a lot more of them).
  5. But the defaultContent are not ok, and would require work.

Therefore, for the dot-dot release, I recommend just changing one line, which is what I did.

<languagePopulation type="yue" writingPercent="4.3" populationPercent="90" references="R1317"/> <!--Cantonese-->
<languagePopulation type="yue" populationPercent="5.2" references="R1317"/> <!--Cantonese-->

I'll file a separate bug for completing the work in 31. I did leave in a flag that can be used to fix the tests: TestSupplementalInfo.LOCALES_FIXED

Ticket for the rest of the work is ticket:9928

