[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #9408(accepted data)

Opened 16 months ago

Last modified 7 months ago

Review and fix primarily spoken languages

Reported by: mark Owned by: rick
Component: supplemental Data Locale:
Phase: rc Review:
Weeks: Data Xpath:
Xref:

Description

For the literacy column in the CLDR data, languages that are primarily spoken are given a small percentage, reflecting the amount of written usage. The following list are languages that may be in error.

It needs to be reviewed to see which are primarily spoken, and their numbers adjusted in that case. The first priority is the ones over 1M; next the ones over 100K.

78,028,978 Wu Chinese
43,738,936 Lahnda
41,850,999 Egyptian Arabic
37,714,005 Xiang Chinese
29,911,107 Hakka Chinese
24,709,175 Min Nan Chinese
23,827,337 Algerian Arabic
22,108,209 Gan Chinese
19,452,791 Moroccan Arabic
9,432,810 Marwari
8,754,888 Bavarian
7,857,381 Tunisian Arabic
5,755,957 Baluchi
4,988,810 Betawi
4,802,750 Main-Franconian
4,031,496 Zhuang
3,199,329 Gilaki
2,890,561 Bikol
2,515,415 Goan Konkani
2,438,347 Jamaican Creole English
2,418,366 Krio
2,304,354 Sasak
2,185,573 Batak Toba
1,886,561 Gondi
1,738,686 Dyula
1,493,527 Dogri
1,420,875 Brahui
1,402,465 Mandingo
1,401,963 Beja
1,357,707 Sidamo
1,272,377 Gheg Albanian
1,179,100 Tulu
1,142,099 Nyamwezi
1,121,075 West Flemish
1,004,861 Acoli
973,708 Bakhtiari
956,598 Talysh
828,519 Chimborazo Highland Quichua
796,074 Venetian
795,337 Kongo
786,066 Rajasthani
764,055 Serer
726,247 Tiv
724,769 Picard
658,847 Frafra
645,558 Capiznon
540,806 Mingrelian
531,553 Rusyn
531,286 Susu
526,633 Ligurian
499,803 Silesian
464,394 Kara-Kalpak
429,671 Tamashek
388,535 Banjar
387,931 Pontic
375,772 Kinaray-a
364,620 Riffian
355,866 Fang
349,358 Fiji Hindi
340,679 Yao
323,436 Mongo
312,391 Buriat
285,073 Gayo
234,896 Zeelandic
230,488 Extremaduran
227,958 Saurashtra
218,556 Mandar
188,241 Nzima
180,794 Badaga
180,794 Ao Naga
176,462 Latgalian
163,946 Khowar
159,085 Central Dusun
141,983 Kirmanjki
131,687 Nyasa Tonga
128,282 Selayar
124,079 Pennsylvania German
118,111 Wayuu
104,101 Sassarese Sardinian
83,396 Plautdietsch
71,983 Võro
61,933 Arpitan
57,013 Mentawai
53,451 Bishnupriya
53,368 Tornedalen Finnish
49,979 Kashubian
38,222 Cree
35,019 Lombard
32,227 Braj
30,617 Emilian
29,818 Western Mari
26,724 Cajun French
23,371 Atsam
21,536 Pohnpeian
21,473 Muslim Tat
20,923 Laz
19,724 Central Yupik
18,464 Kaingang
15,616 Tsakhur
11,917 Lower Silesian
9,604 Northern Frisian
9,600 Nheengatu
8,374 Roviana
7,115 Kosraean
7,005 Hiri Motu
6,814 Zoroastrian Dari
6,122 Piedmontese
5,898 Yapese
3,548 Veps
2,838 Turoyo
2,299 Rotuman
2,292 Slave
2,000 Eastern Frisian
959 Saterland Frisian
795 Seri
763 Chipewyan
400 Inupiaq
198 Tsakonian
140 Ingrian
97 Araona
65 Tokelau

Attachments

Change History

comment:1 Changed 16 months ago by emmons

  • Status changed from new to accepted
  • Component changed from unknown to supplemental
  • Priority changed from assess to medium
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 30
  • Owner changed from anybody to rick
  • Type changed from unknown to data

Work with Mark - maybe do the top 10 first, and then more later.

comment:2 Changed 13 months ago by federicoleva@…

I see the list contains several languages which are currently the focus of some language efforts at Wikimedia. If you can share the method you use to review these numbers, maybe we can help with some of the rows.

comment:3 Changed 12 months ago by rick

@federicoleva: If you have specific data concerning language codes, territories, and populations for any of these, it would be helpful.

In CLDR supplementary data for primarily spoken languages, we will estimate the population that uses it for written communication, for example at 5%. That is the data we need: what is the literacy rate for the population in the language, in each territory. How much is the language used for written communication, not just how much is it spoken.

This list also doesn't contain any 2-or-3 letter language codes or territories, so quite a lot of research still needs to be done.

comment:4 Changed 12 months ago by rick

Just sent a longish e-mail to Federico with information about what is needed to help.

comment:5 Changed 12 months ago by rick

  • Milestone changed from 30 to 31

Pushing to 31, and we'll see if we can get more detailed information for any/all of these in that timeframe.

comment:6 Changed 7 months ago by rick

  • Milestone changed from 31 to upcoming
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.