[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10099(new data)

Opened 4 weeks ago

Last modified 4 weeks ago

Territory-Language Information wildly inaccurate

Reported by: fios@… Owned by: anybody
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

The Territory-Language Information (/cldr/charts/latest/supplemental/territory_language_information.html) is wildly inaccurate the way it's given/formatted. It appears to simply apply the *national* literacy rate across all languages spoken in that country, which is simply not a reflection of reality. 99% of Bavarian speakers simply are NOT literate in Bavarian. I would be surprised if this figure even approach 10% as the language is not recognized as a language and virtually not taught anywhere.

Specifically in our case, this was noticed when Mozilla used CLDR data to give stats on literacy rates on Scottish Gaelic. Looking at the UK, the data has the following serious issues:

1) It lists English, Irish and Scottish Gaelic as {0} which is factually wrong. Oddly enough, English is NOT legally the official language though it is the de-facto official language. Technically, through the Welsh Language Act, Welsh is the only official language in the UK. Irish has no legal status in Northern Ireland and Scottish Gaelic is not official in Scotland either. It has the oddly meaningless legal status of a "language enjoying equal respect".

2) 99% literacy for all the languages not English is inaccurate. Sylheti is notorious for not being taught despite its prevalence in the Bangladeshi community. For Scottish Gaelic, the literacy figure is at BEST 37% (2011 census(I'd post a link but apparently that's spam...) has no separate category for native speaker literacy but the closest measure is "speak, read and write Gaelic" which is 37.2%)

Glancing through the table there is also the issue that it seems to conflate writing systems and languages. Simplified/Traditional Chinese are not the same thing as speakers of Mandarin as Simplified/Traditional Chinese are equally used to write Mandarin, Cantonese, Wu etc.

This list, however well-meant, either needs marking as a beta version or taken off the public site until fixed. I suggest for starters to immediately change the cell formatting so the national literacy rate only applies to the (de facto) national language unless specific data is available for the other languages (I would imagine this data exists for some languages such as Basque or Catalan).

Attachments

Change History

comment:1 Changed 4 weeks ago by fios@…

The Gaelic census data is here

View

Add a comment

Modify Ticket

Action
as new
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.