[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #9938(closed data: needs-more-info)

Opened 17 months ago

Last modified 9 months ago

Update language data for (mostly) Nordic countries (and a few other countries)

Reported by: kent.karlsson14@… Owned by: rick
Component: supplemental Data Locale:
Phase: rc Review:
Weeks: Data Xpath:


See http://unicode.org/cldr/trac/ticket/9919#comment:4 and follow-on comments to that ticket.
Much language data is missing.


Change History

comment:1 follow-up: ↓ 2 Changed 16 months ago by kent.karlsson14@…

Language-PopulationPercent data for some countries.

XML snippets for Nordic (including Greenland, since it has a close connection to Denmark) and Baltic countries.
And some comments about Germany and France (not completed).

Very few countries have official statistics on the matter (Finland is one of those few), and even then
usually only covers "primary language". So much of this are taken from estimates from various sources,
in some cases even guesstimates. If you have higher quality data, please use that. Even if not, the
proposal here is an improvement over the current data on this matter in CLDR.

A few countries have quite a lot of data, with a long "tail". It does not seem sensible to include
all of the "tail" in the CLDR data file. I've made a preliminary "cut" at a fairly inclusive level.

		<territory type="AX" gdp="1563000000" literacyPercent="99" population="28666">	<!--Åland-->
			<languagePopulation type="sv"  populationPercent="95" officialStatus="official" />	<!--Swedish-->
			<languagePopulation type="en"  populationPercent="90" />	<!--English-->
			<languagePopulation type="fi"  populationPercent="5" officialStatus="official_minority" />	<!--Finnish-->
			<languagePopulation type="lt"  populationPercent="1.0" />	<!--Latvian-->
			<languagePopulation type="ro"  populationPercent="1.0" />	<!--Romanian-->
			<languagePopulation type="et"  populationPercent="0.7" />	<!--Estonian-->
			<languagePopulation type="ru"  populationPercent="0.5" />	<!--Russian-->
			<languagePopulation type="th"  populationPercent="0.5" />	<!--Thai-->
			<languagePopulation type="fss" populationPercent="0.2" officialStatus="official_minority" />	<!--Finland-Swedish sign language-->

		<territory type="DK" gdp="264837000000" literacyPercent="99" population="5745526">	<!--Denmark-->
			<languagePopulation type="da"  populationPercent="93" officialStatus="official" />	<!--Danish-->
			<languagePopulation type="en"  populationPercent="95" />	<!--English-->
			<languagePopulation type="jut" populationPercent="30" />	<!--Jutish (guesstimate); mainly spoken, minor attempts at written form revival-->
			<languagePopulation type="ar"  populationPercent="1.2" />	<!--Arabic-->
			<languagePopulation type="tr"  populationPercent="1.1" />	<!--Turkish-->
			<languagePopulation type="de"  populationPercent="1.0" officialStatus="official_regional" />	<!--German-->  
			<languagePopulation type="sh"  populationPercent="0.8" />	<!--Serbian/Croatian/Bosnian, sr/hr/bs-->
			<languagePopulation type="fo"  populationPercent="0.7" />	<!--Faroese, not including Faroe islands-->
			<languagePopulation type="ku"  populationPercent="0.7" />	<!--Kurdish/Kurmanji/Sorani-->
			<languagePopulation type="ur"  populationPercent="0.40" />	<!--Urdu-->
			<languagePopulation type="pa"  populationPercent="0.35" />	<!--Punjabi-->
			<languagePopulation type="fa"  populationPercent="0.35" />	<!--Farsi-->
			<languagePopulation type="so"  populationPercent="0.30" />	<!--Somali-->
			<languagePopulation type="nb"  populationPercent="0.28" />	<!--Norwegian-->
			<languagePopulation type="sv"  populationPercent="0.26" />	<!--Swedish-->
			<languagePopulation type="dsl" populationPercent="0.25" officialStatus="official_minority" />	<!--Danish sign language-->
			<languagePopulation type="pl"  populationPercent="0.24" />	<!--Polish-->
			<languagePopulation type="zh"  populationPercent="0.24" />	<!--Chinese/Cantonese-->
			<languagePopulation type="vi"  populationPercent="0.24" />	<!--Vietnamese-->
			<languagePopulation type="kl"  populationPercent="0.21" />	<!--Kalaallisut, not including Greenland-->
11000	fr	Fransk		Frankrig, Canada, Belgien, Svejts, tidligere franske kolonier		plus som tredjespråk?
11000	ru/uk	Russisk/Ukrainsk		Rusland, tidligere Sovjetunionen
11000	ta	Tamil	0,2%	Sri Lanka
7000	is	Islandsk	0.12	Island
7000	es	Spansk		Spanien, Latinamerika		plus som tredjespråk?
7000	... 	Berbisk/Shelha/Tarift/Tamazight		Marokko, Algeriet
7000	th/lo	Thai/Lao		Thailand
6000	nl	Hollandsk		Holland, Belgien
6000	sq 	Albansk/Tosk/Geg		0.1%	Makedonien, Kosova, Serbien, Albanien
5000	ps	Pashto		Afghanistan, Pakistan		kanske fler nu?
5000	tl	Pilipino/Tagalog		Filipinerne
4000	fi	Finsk		Finland, Sverige
4000	it	Italiensk		Italien
3000	pt	Portugisisk		Brasilien, Portugal, tidligere portugisiske kolonier
3000	ro	Rumænsk		Rumænien

		<territory type="EE" gdp="39000000000" literacyPercent="99" population="1315944">	<!--Estonia-->
			<languagePopulation type="et"  populationPercent="71" officialStatus="official"/>	<!--Estonian-->
			<languagePopulation type="ru"  populationPercent="66" />	<!--Russian, incl. 2nd/3rd lang. (guesstimate)-->
			<languagePopulation type="en"  populationPercent="46" />	<!--English (guesstimate)-->
			<languagePopulation type="de"  populationPercent="22" />	<!--German, 2nd/3rd lang. (guesstimate)-->
			<languagePopulation type="vro" populationPercent="5.7" />	<!--Võro-->
			<languagePopulation type="fi"  populationPercent="0.5" />	<!--Finnish-->
			<languagePopulation type="eso" populationPercent="0.2" />	<!--Estonian sign language-->
			<languagePopulation type="rsl" populationPercent="0.1" />	<!--Russian sign language-->

		<territory type="FI" gdp="27600000000" literacyPercent="99" population="5502284">	<!--Finland (Åland?)-->
			<languagePopulation type="fi"  populationPercent="92" officialStatus="official" />	<!--Finnish-->
			<languagePopulation type="en"  populationPercent="70"/>	<!--English-->
			<languagePopulation type="sv"  populationPercent="47" officialStatus="official" />	<!--Swedish, incl. as 2nd/3rd lang.-->
			<languagePopulation type="ru"  populationPercent="1.3"/>	<!--Russian-->
			<languagePopulation type="et"  populationPercent="0.9"/>	<!--Estonian-->
			<languagePopulation type="so"  populationPercent="0.33"/>	<!--Somali-->
			<languagePopulation type="ar"  populationPercent="0.31"/>	<!--Arabic-->
			<languagePopulation type="fse" populationPercent="0.25"  officialStatus="official_minority" />	<!--Finnish sign language-->
			<languagePopulation type="ku"  populationPercent="0.21"/>	<!--Kurdish-->
			<languagePopulation type="zh"  populationPercent="0.20"/>	<!--Chinese/Cantonese-->
			<languagePopulation type="sq"  populationPercent="0.17"/>	<!--Albanian-->
			<languagePopulation type="fa"  populationPercent="0.16"/>	<!--Farsi-->
			<languagePopulation type="th"  populationPercent="0.16"/>	<!--Thai-->
			<languagePopulation type="vi"  populationPercent="0.15"/>	<!--Vietnamese-->
			<languagePopulation type="tr"  populationPercent="0.13"/>	<!--Turkish-->
			<languagePopulation type="es"  populationPercent="0.13"/>	<!--Spanish-->
			<languagePopulation type="rmf" populationPercent="0.1"   officialStatus="official_minority" />	<!--Kalo Finnish Romani-->
			<languagePopulation type="se"  populationPercent="0.04"  officialStatus="official_regional" />	<!--Northern Sami-->
			<languagePopulation type="smn" populationPercent="0.01"  officialStatus="official_regional" />	<!--Inari Sami-->
			<languagePopulation type="sms" populationPercent="0.01"  officialStatus="official_regional" />	<!--Skolt Sami-->
			<languagePopulation type="fss" populationPercent="0.01"  officialStatus="official_minority" />	<!--Finland-Swedish sign language-->
German	de 6168
Polish	4794
French	3878
Romanian	3161
Nepali	2951
Tagalog	2932
Bengali, Bangla	2881
Ukrainian	2843
Hungarian	2811
Italian	2492
Urdu	2432
Portuguese	2409
Bulgarian	2313
Bosnian	2176
Hindi	1882
Tamil	1673
Swahili	1641
Dutch	1516
Latvian	1464
Amharic	1387
Serbo-Croatian	1380
Japanese	1278
Greek (modern)	1258
Lithuanian	1215

		<territory type="FO" gdp="1642000000" literacyPercent="99" population="49188">	<!--Faroe Islands-->
			<languagePopulation type="fo"  populationPercent="93" officialStatus="official" />	<!--Faroese-->
			<languagePopulation type="da"  populationPercent="90" officialStatus="official" />	<!--Danish-->
			<languagePopulation type="en"  populationPercent="70"/>	<!--English-->

		<territory type="GL" gdp="1200000000" literacyPercent="99" population="56483">	<!--Greenland-->
			<languagePopulation type="kl"  populationPercent="89" officialStatus="official" />	<!--Kalaallisut-->
			<languagePopulation type="da"  populationPercent="70" />	<!--Danish, mostly 2nd/3rd lang.-->
			<languagePopulation type="da"  populationPercent="60" />	<!--English, as 2nd/3rd language (guesstimate)-->
			<languagePopulation type="iks" populationPercent="0.25" />	<!--Inuit sign language-->

		<territory type="IS" gdp="16146000000" literacyPercent="99" population="332529">	<!--Iceland-->
			<languagePopulation type="is"  populationPercent="93" officialStatus="official"/>	<!--Icelandic-->
			<languagePopulation type="en"  populationPercent="80" />	<!--English (guesstimate)--> 
			<languagePopulation type="da"  populationPercent="60" />	<!--Danish/"Scandinavian" (guesstimate)--> 
			<languagePopulation type="pl"  populationPercent="0.3" />	<!--Polish-->
			<languagePopulation type="lt"  populationPercent="0.43" />	<!--Lithuanian-->
			<languagePopulation type="de"  populationPercent="0.31" />	<!--German-->
			<languagePopulation type="pt"  populationPercent="0.28" />	<!--Portuguese-->
			<languagePopulation type="icl" populationPercent="0.25" officialStatus="official_minority" />	<!--Icelandic sign language--> 
			<languagePopulation type="tl"  populationPercent="0.24" />	<!--Filipino/Tagalog-->
			<languagePopulation type="th"  populationPercent="0.17" />	<!--Thai-->
			<languagePopulation type="lv"  populationPercent="0.14" />	<!--Latvian-->

		<territory type="LT" gdp="86000000000" literacyPercent="99" population="2853500">	<!--Lithuania-->
			<languagePopulation type="lt"  populationPercent="95" officialStatus="official"/>	<!--Lithuanian-->
			<languagePopulation type="ru"  populationPercent="70" />	<!--Russian, mainly as 2nd language-->
			<languagePopulation type="en"  populationPercent="38" />	<!--English, 2nd/3rd language-->
			<languagePopulation type="sgs" populationPercent="15" />	<!--Samogitian, mainly spoken-->
			<languagePopulation type="pl"  populationPercent="9.0" />	<!--Polish-->
			<languagePopulation type="de"  populationPercent="8.0" />	<!--German, 2nd/3rd language-->
			<languagePopulation type="be"  populationPercent="1.3" />	<!--Belarusian-->
			<languagePopulation type="lv"  populationPercent="1.0" />	<!--Latvian-->
			<languagePopulation type="uk"  populationPercent="0.7" />	<!--Ukrainian-->
			<languagePopulation type="lls" populationPercent="0.2" />	<!--Lithuanian sign language-->
			<languagePopulation type="rsl" populationPercent="0.1" />	<!--Russian sign language-->

		<territory type="LV" gdp="51000000000" literacyPercent="99" population="1957200">	<!--Latvia-->
			<languagePopulation type="lv"  populationPercent="80" officialStatus="official" />	<!--Latvian, incl. as 2nd lang.-->
			<languagePopulation type="ru"  populationPercent="80" />	<!--Russian, incl. as 2nd lang.-->
			<languagePopulation type="en"  populationPercent="30" />	<!--English (guesstimate), 2nd/3rd language-->
			<languagePopulation type="de"  populationPercent="12" />	<!--German-->
			<languagePopulation type="ltg" populationPercent="10" />	<!--Latgalian-->
			<languagePopulation type="pl"  populationPercent="2.1" />	<!--Polish-->
			<languagePopulation type="lt"  populationPercent="1.6" />	<!--Lithuanian-->
			<languagePopulation type="et"  populationPercent="0.4" />	<!--Estonian-->
			<languagePopulation type="lsl" populationPercent="0.2" />	<!--Latvian sign language-->
			<languagePopulation type="rsl" populationPercent="0.1" />	<!--Russian sign language-->

		<territory type="NO" gdp="364700000000" literacyPercent="99" population="5236826">	<!--Norway-->
			<languagePopulation type="nb"  populationPercent="97"  officialStatus="official" />	<!--Norwegian Bokmål-->
			<languagePopulation type="en"  populationPercent="90" />	<!--English-->
			<languagePopulation type="nn"  populationPercent="25"  officialStatus="official" />	<!--Norwegian Nynorsk-->
			<languagePopulation type="se"  populationPercent="3.3" officialStatus="official_regional" />	<!--Northern Sami-->
			<languagePopulation type="sv"  populationPercent="1.7" />	<!--Swedish-->
			<languagePopulation type="da"  populationPercent="1.0" />	<!--Danish-->
			<languagePopulation type="ur"  populationPercent="0.8" />	<!--Urdu-->
			<languagePopulation type="ar"  populationPercent="0.7" />	<!--Arabic-->
			<languagePopulation type="vi"  populationPercent="0.5" />	<!--Vietnamese-->
			<languagePopulation type="sq"  populationPercent="0.4" />	<!--Albanian-->
			<languagePopulation type="sh"  populationPercent="0.4" />	<!--Serbian/Croatian/Bosnian, sr/hr/bs-->
			<languagePopulation type="so"  populationPercent="0.35" />	<!--Somali-->
			<languagePopulation type="ku"  populationPercent="0.3" />	<!--Kurdish-->
			<languagePopulation type="es"  populationPercent="0.3" />	<!--Spanish-->
			<languagePopulation type="tr"  populationPercent="0.3" />	<!--Turkish-->
			<languagePopulation type="ta"  populationPercent="0.25" />	<!--Tamil-->
			<languagePopulation type="nsl" populationPercent="0.25" officialStatus="official_minority" />	<!--Norwegian sign language-->
			<languagePopulation type="fkv" populationPercent="0.1"  officialStatus="official_regional" />	<!--Kven Finnish-->
			<languagePopulation type="rom" populationPercent="0.06" officialStatus="official_minority" />	<!--Romani-->
			<languagePopulation type="smj" populationPercent="0.01" officialStatus="official_regional" />	<!--Lule Sami-->
			<languagePopulation type="sma" populationPercent="0.01" officialStatus="official_regional" />	<!--Southern Sami-->

		<territory type="SE" gdp="498130000000" literacyPercent="99" population="9954420">	<!--Sweden-->
			<languagePopulation type="sv"  populationPercent="97" officialStatus="official" />	<!--Swedish-->
			<languagePopulation type="en"  populationPercent="89" />	<!--English-->
			<languagePopulation type="fi"  populationPercent="2.0" officialStatus="official_minority" />	<!--Finnish-->
			<languagePopulation type="ar"  populationPercent="1.55" />	<!--Arabic-->
			<languagePopulation type="sh"  populationPercent="1.30" />	<!--Serbian/Croatian/Bosnian, sr/hr/bs-->
			<languagePopulation type="ku"  populationPercent="0.84" />	<!--Kurdish-->
			<languagePopulation type="pl"  populationPercent="0.pl" />	<!--Polish-->
			<languagePopulation type="es"  populationPercent="0.75" />	<!--Spanish-->
			<languagePopulation type="fa"  populationPercent="0.74" />	<!--Persian-->
			<languagePopulation type="de"  populationPercent="0.72" />	<!--German-->
			<languagePopulation type="da"  populationPercent="0.57" />	<!--Danish-->
			<languagePopulation type="nb"  populationPercent="0.54" />	<!--Norwegian(nb), nn?-->
			<languagePopulation type="so"  populationPercent="0.53" />	<!--Somali-->
			<languagePopulation type="syr" populationPercent="0.52" />	<!--... Aramaic, aii/cld-->
			<languagePopulation type="tr"  populationPercent="0.45" />	<!--Turkish-->
			<languagePopulation type="sq"  populationPercent="0.39" />	<!--Albanian-->
			<languagePopulation type="fit" populationPercent="0.3" officialStatus="official_regional" />	<!--Tornedalen Finnish-->
			<languagePopulation type="rom" populationPercent="0.3" officialStatus="official_minority" />	<!--Romani-->
			<languagePopulation type="th"  populationPercent="0.30" />	<!--Thai-->
			<languagePopulation type="ru"  populationPercent="0.29" />	<!--Russian-->
			<languagePopulation type="swl" populationPercent="0.25" officialStatus="official_minority" />	<!--Swedish sign language-->
			<languagePopulation type="hu"  populationPercent="0.24" />	<!--Hungarian-->
			<languagePopulation type="yue" populationPercent="0.20" />	<!--Cantonese-->
			<languagePopulation type="ti"  populationPercent="0.19" />	<!--Tigrinya-->
			<languagePopulation type="ro"  populationPercent="0.18" />	<!--Romanian-->
			<languagePopulation type="el"  populationPercent="0.16" />	<!--Greek-->
			<languagePopulation type="vi"  populationPercent="0.13" />	<!--Vietnamese-->
			<languagePopulation type="fr"  populationPercent="0.12" />	<!--French-->
			<languagePopulation type="et"  populationPercent="0.12" />	<!--Estonian-->
			<languagePopulation type="nl"  populationPercent="0.11" />	<!--Dutch-->
			<languagePopulation type="pt"  populationPercent="0.10" />	<!--Portuguese-->
			<languagePopulation type="it"  populationPercent="0.10" />	<!--Italian-->
			 <!--20000-40000 Sami in Sweden; about 8.000 speak Sami-->
			<languagePopulation type="se"  populationPercent="0.06" officialStatus="official_regional" />	<!--Northern Sami-->
			<languagePopulation type="yi"  populationPercent="0.03" officialStatus="official_minority" />	<!--Yiddish-->
			<languagePopulation type="smj" populationPercent="0.01" officialStatus="official_regional" />	<!--Lule Sami-->
			<languagePopulation type="sma" populationPercent="0.01" officialStatus="official_regional" />	<!--Southern Sami-->
Bengaliska 8000
Urdu 7500
Punjabiska 7500
Mandarin 7500
Makedonska 7500
Litauiska 7000
Tjeckiska 6000
Isländska 6000
Azeriska 6000
Amhariska 6000
Tagalog tl 5500
Armeniska 5500
Ukrainska 5000
Slovenska 5000
Lettiska 5000
Pasjtunska 4000
Turkmenska 3500
Tamilska 3500
Rwandiska/burundiska 3500
Japanska 3500
Gujaratiska 3500
Slovakiska 3000
Malajiska 3000
Bulgariska 3000
Mandinka 2500
Koreanska 2500
Uzbekiska 2000
Singalesiska 2000
Oromiska 2000
Min nan 2000
Hindi 2000
Hebreiska 2000
Aromunska 2000
	Älvdalska 1900	ovd	0.02%
Vitryska 1700
  Överkalixmål 1600
Wolof 1600
Burmesiska 1600
Quechua 1500
Swahili 1300
Mongoliska 1300
Lingala 1300
Berberspråk 1300
Katalanska 1200	ca
Fulani 1200
Akan 1000
Yoruba 900
Luganda 900
Kazakiska 850
Ilokano 850
Igbo 850
Hakka 850
Cebuano 850
Sylheti 800
Khmer 800
Georgiska 800
Dinka 800
Darginska 750
Pangasinan 700
Malayalam 700
Kikongo 700
Kikuyu 600
Hazaragi 600
Tigré 550
Telugu 550
Kreolfranska (flera olika språk) 550
Kirgisiska 500
<!-- http://www.sviv.se/blog/2015/06/660-000-svenskar-bor-utomlands/
USA – 150 000		0.05% of USA pop
Storbritannien – 90 000	0.17% of GB pop
Norge – 90 000		1.7% of NO pop
Spanien – 90 000	0.19% of ES pop
Frankrike – 30 000
Tyskland – 23 000
Thailand – 20 000
Finland – 15 000
Danmark – 15 000
Italien – 12 000

<!--I requested deletion of the data line for Interlingua for Sweden. Please also delete Interlingua from the data for France.-->
		<territory type="FR" gdp="2591000000000" literacyPercent="99" population="66553800">	<!--France-->
			<languagePopulation type="fr"  populationPercent="97" officialStatus="official"/>	<!--French-->
			<languagePopulation type="en"  populationPercent="39"/>	<!--English-->
			<languagePopulation type="oc"  writingPercent="5" populationPercent="3" references="R1015"/>	<!--Occitan-->
			<languagePopulation type="it"  populationPercent="1.7"/>	<!--Italian-->
			<languagePopulation type="pt"  populationPercent="1.3"/>	<!--Portuguese-->
			<languagePopulation type="pcd" populationPercent="1.1"/>	<!--Picard-->
			<languagePopulation type="gsw" writingPercent="5" populationPercent="0.91" references="R1125"/>	<!--Swiss German-->
			<languagePopulation type="br"  writingPercent="3" populationPercent="0.83" references="R1138"/>	<!--Breton-->
			<languagePopulation type="co"  writingPercent="5" populationPercent="0.57" references="R1012"/>	<!--Corsican-->
			<languagePopulation type="ca"  populationPercent="0.17"/>	<!--Catalan-->
			<languagePopulation type="nl"  populationPercent="0.13"/>	<!--Dutch-->
			<languagePopulation type="eu"  populationPercent="0.13"/>	<!--Basque-->
			<languagePopulation type="frp" populationPercent="0.094"/>	<!--Arpitan-->
<!--Regional languages	
.Alsatian; Catalan; Corsican; Breton; .Gallo; Occitan; .some languages of New Caledonia; 
.some Walloon; Basque; .(West Flemish dialect); Franco-Provençal; Lorraine Franconian;
French Guiana Creole; .Guadeloupean Creole; .Martiniquan Creole; .Oïl languages;
.Réunion Creole; .Yeniche, .the Maroon creoles and Amerindian languages of French Guiana-->
<!--Main immigrant languages
.Berber, .Arabic, Portuguese, Spanish, Italian, .Polish, 
.Turkish, .German, .Chinese, .Vietnamese, Dutch, English-->
<!--Main foreign languages	
English (34%)	Spanish (13%)	German (8%)	Italian (2%)-->
<!--Sign languages	.French Sign Language fsl-->

<!--Oddly, Esperanto is listed (only) for San Marino. Seems odd.-->
        <territory type="SM" gdp="1914000000" literacyPercent="96" population="33020">	<!--San Marino-->
            <languagePopulation type="it" populationPercent="89" officialStatus="official"/>	<!--Italian-->
<!--(Not sure where the text file data in the Java source comes from; apparently not the supplementalData.xml file.)-->

*Officiella minoritetsspråk är danska, lågtyska, sorbiska, romani och frisiska.
*Dansk, plattysk, vendisk, romani og frisisk er officielt anerkendt og beskyttet af Sprogpagten.
Deutsche Gebärdensprache (DGS)
Dänisch in Schleswig-Holstein
Niederdeutsch (inkl. Plautdietsch)
Niederfränkisch in Nordrhein-Westfalen, mit Limburgisch und Kleverländisch
Nordfriesisch in Schleswig-Holstein
Saterfriesisch in Niedersachsen
Sorbisch in der Lausitz, genauer:
Obersorbisch in der Oberlausitz in Sachsen
Niedersorbisch in der Niederlausitz in Brandenburg
		<territory type="DE" gdp="3748000000000" literacyPercent="99" population="80854400">	<!--Germany-->
			<languagePopulation type="de"  populationPercent="91" officialStatus="official"/>	<!--German-->
			<languagePopulation type="en"  populationPercent="64"/>	<!--English-->
			<languagePopulation type="fr"  populationPercent="18" references="R1304"/>	<!--French-->
			<languagePopulation type="bar" writingPercent="5" populationPercent="17" references="R1318"/>	<!--Bavarian-->
			<languagePopulation type="nds" writingPercent="5" populationPercent="12" officialStatus="official_regional" references="R1167"/>	<!--Low German-->
			<languagePopulation type="nl"  populationPercent="9" references="R1304"/>	<!--Dutch-->
			<languagePopulation type="it"  populationPercent="7" references="R1304"/>	<!--Italian-->
			<languagePopulation type="es"  populationPercent="6" references="R1304"/>	<!--Spanish-->
			<languagePopulation type="ru"  populationPercent="6" references="R1304"/>	<!--Russian-->
			<languagePopulation type="vmf" populationPercent="6"/>	<!--Main-Franconian-->
			<languagePopulation type="tr"  populationPercent="1.8"/>	<!--Turkish-->
			<languagePopulation type="swg" writingPercent="5" populationPercent="1"/>	<!--Swabian-->
			<languagePopulation type="hr"  populationPercent="0.79"/>	<!--Croatian-->
			<languagePopulation type="ku_Latn" populationPercent="0.3"/>	<!--Kurdish (Latin)-->
			<languagePopulation type="el"  populationPercent="0.38"/>	<!--Greek-->
			<languagePopulation type="ksh" populationPercent="0.3" references="R1174"/>	<!--Colognian-->
			<languagePopulation type="pl"  populationPercent="0.29"/>	<!--Polish-->
			<languagePopulation type="gsg" populationPercent="0.2"  officialStatus="official_minority"/>	<!--German Sign Language-->
			<languagePopulation type="rom" populationPercent="0.08" officialStatus="official_minority"/>	<!--Romani-->
			<languagePopulation type="da"  populationPercent="0.06" officialStatus="official_regional"/>	<!--Danish-->
			<languagePopulation type="hsb" writingPercent="5" populationPercent="0.016" officialStatus="official_regional" references="R1292"/>	<!- Upper Sorbian-->
			<languagePopulation type="frr" populationPercent="0.012" officialStatus="official_regional" references="R1095"/>	<!--Northern Frisian-->
			<languagePopulation type="dsb" writingPercent="5" populationPercent="0.0087" officialStatus="official_regional" references="R1222"/>	<!--Lower Sorbian-->
			<languagePopulation type="frs" populationPercent="0.0025" officialStatus="official_regional" references="R1300"/>	<!--Eastern Frisian-->
			<languagePopulation type="stq" populationPercent="0.0012" officialStatus="official_regional"/>	<!--Saterland Frisian-->
			<languagePopulation type="pfl" populationPercent="0" references="R1150"/>	<!--Palatine German-->

comment:2 in reply to: ↑ 1 Changed 15 months ago by kent.karlsson14@…

Replying to kent.karlsson14@…:

<territory type="SE" gdp="498130000000" literacyPercent="99" population="9954420"> <!--Sweden-->

Passed 10 million on the 20th of January, 2017 (prognostically): http://www.scb.se/

comment:3 Changed 12 months ago by emmons

  • Status changed from new to accepted
  • Component changed from unknown to supplemental
  • Priority changed from assess to minor
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 32
  • Owner changed from anybody to rick
  • Type changed from unknown to data

comment:4 Changed 11 months ago by tilljohanvia@…

So I'm looking at the Swedish numbers, and English seems compiled by numbers of speakers who can reasonably handle the language – which makes sense given the purpose of the database – but for Sweden important languages like Norwegian, Danish and German, especially, but also French and Spanish, seem to be taken from ... number of native speakers? Shouldn't they work according to the same logic as English?

comment:5 Changed 11 months ago by kent.karlsson14@…

Danish and Norwegian are special cases, though not really taught in Swedish schools. Most Swedes can follow Norwegian somewhat, and most Swedes can follow a bit of (spoken) Danish (when well articulated, but usually only then, except for numbers which none of us can learn...).

As for being fluent in Norwegian or Danish, very few Swedes are. Most Swedes assume that Swedish is understood all over Scandinavia... Those Swedes who have lived in Denmark or Norway are the exception.

So, for Sweden, pick a number between 1 and 90 for Norwegian and Danish; or stick with the numbers given above.

English: Not only is English learnt to a high degree (much much more than German/French/Spanish) in schools, there are also very many films and TV series that are in English (and they are not dubbed, except when targeted for small children).

German: Though quite many Swedes have studied German in school (maybe around the figure you cite, to different levels), very few are really fluent in German after finishing school, very few. Next to nothing on Swedish TV is in German. So I stuck with a low number.

French: Though some Swedes have studied French in school (maybe around the figure you cite, to different levels), far fewer are (to some degree) fluent in French (after finishing school). Next to nothing on Swedish TV is in French. So I stuck with a low number.

Spanish: Growing in schools, but mostly there is a large group of immigrants, and children to immigrants, from South America (far fewer from Spain). And they are fluent in Spanish. Few of those who only learn Spanish in school speak it fluently.

comment:6 Changed 11 months ago by tilljohanvia@…

Sure, the numbers for German, French and Spanish should be significantly lower than for English. English is far more important, which we can see in all surveys – the de facto second language of Sweden, without which you're shut out from higher education, have more difficulties navigating Swedish society and so on. They're not playing in the same league.

But is that really a good reason to accept one method for English, but not for the other languages? We already see the importance of the English language reflected in the results if use the same methods, look at the same surveys. The numbers *are* much lower.

It's true that the fact that a large number of the Swedish population study German doesn't mean that everyone who took five years of German speak it. But I think it's even more misleading to include numbers for a language, measuring not native speakers but people who can use the language in a computer setting, and completely ignore that a pretty large number of Swedes who do speak it, counting only Germans who moved to Sweden.

(It is probably a pretty good approximation for Persian, Polish, Kurdish and other immigrant languages in the list, though, given the very small number of Swedes who learn these languages.)

comment:7 Changed 11 months ago by tilljohanvia@…

Regarding Norwegian and Danish: Isn't the point of the CLDR database to determine which languages are likely to be useful in different locations? If that case, putting Danish and Norwegian at ~1% is likely to be misleading. I agree that they are thorny special cases, but we don't have to pick a random number: We have the approximations for how many Swedes who consider themselves able to use the languages from e.g. the Eurobarometers (see tickets 10280 and 10282), which is sadly probably the best we have here. But since we DO have the numbers in there, why not use them?

comment:8 Changed 11 months ago by kent.karlsson14@…

For Danish and Norwegian (the latter not covered by the later referenced "Eurobarometer"), the given numbers are just as arbitrary as picking any number between 1 and 90. The real situation is as I described above.

As for German, I think that 23% is a GROSS overestimate (of fraction of pop. being fluent). Remember that this is from a VERY tedious questionnaire, and I did not (quickly) find any information about response rate, target of questionnaire, or other methodology issues. (It is sometimes said that you can prove anything with (bad) statistics.) 23% seems more realistic for the fraction of pop. having studied German in school (and then forgot most of it). Actually, it is now even fewer (19% in 2011, and then (<) 9% for "level 3"): http://skolvarlden.se/artiklar/tyska-spraket-ar-pa-dekis.

Spanish is given as (1+4)% (second+third lang.). That seems more realistic; probably an underestimate for Gothenburg and esp. Stockholm. As for school (highschool) percentage: "13,8 procent läste spanska steg 3" (in 2010); also from the same article in Skolvärlden.

comment:9 follow-up: ↓ 10 Changed 10 months ago by tilljohanvia@…

Sorry for the late reply. Let me contextualize the discussion: If I understand the meaning of the CLDR database correctly, it's a tool to figure out what languages are potentially useful in certain territories. For me, I have a specific use case – compact language links on Wikipedia – but of course I mainly want the list to be generally useful.

23% is probably an overestimation, but I don't think it's much more off than merely going by number of native speakers in a country where German is one of the major languages taught in school. The results from the questionnaire should be questioned rather than taken as the truth. Native speakers, however, is an answer to a completely different question. It's not the solution.

Likewise, I agree that we have a problem with Norwegian and Danish – given that they could be construed as dialects of Scandinavian and have a very high mutual intelligibility in text, it's a question of definition, and there's no perfect number. I don't see why random number from a survey will

However, solving that by looking at number of native speakers only will leave the CLDR database much less practically useful. Danish and Norwegian are widely understood and read. As is German, to a lesser degree but more so than most languages.

TL;DR: Your criticism is relevant and has merit. I'd like better numbers than the ones I've submitted. However, I think that simply looking at native speakers for these three languages is even worse, and will diminish the usefulness of the database.

comment:10 in reply to: ↑ 9 Changed 10 months ago by kent.karlsson14@…

Replying to tilljohanvia@…:

23% is probably an overestimation, but I don't think it's much more off than merely going by number of native speakers in a country where German is one of the major languages taught in school. The results from the questionnaire should be questioned rather than taken as the truth. Native speakers, however, is an answer to a completely different question. It's not the solution.

See below.

Likewise, I agree that we have a problem with Norwegian and Danish – given that they could be construed as dialects of Scandinavian and have a very high mutual intelligibility in text, it's a question of definition, and there's no perfect number. I don't see why random number from a survey will

Having just a list of <language,percentage> does not deal well with very closely related languages. But I am not about to suggest changing how this data file is structured.

However, solving that by looking at number of native speakers only will leave the CLDR database much less practically useful. Danish and Norwegian are widely understood and read. As is German, to a lesser degree but more so than most languages.

German is taught (to a lesser and lesser degree), but rarely remembered/used.

Not sure how reliable distance-maps like


In my personal opinion, Swedish and English are closer than Swedish and German (but the maps don't show that). And that even when discounting that Swedish has imported quite a bit from English lately. Much because both English and Swedish have imported quite a lot from French (historically), and because the word order and inflections are closer Swedish/English than to German (which has more inflections, and a quite different word order).

Also, "on the streets" I hear Arabic or Spanish each a thousand times more often than German; I hear Hindi/Urdu, Chinese or Polish (each) a hundred times more often than German... On TV there is German heard, but mostly because many channels are showing quite a lot of WWII (drama)documentaries (very little other programs in/with German), subtitled of course.

I think that an overestimate for German would be a very bad idea and very misleading, while an underestimate, even a heavy one, here would not be bad at all.

For Swedish/Norwegian/Danish, I have no patent solution, due to them being so close. But going for those who speak those languages proper, it would be the figures I've given.

comment:11 Changed 9 months ago by rick

  • Status changed from accepted to closed
  • Resolution set to needs-more-info

Please review guidelines for additions/updates to language/population data, and you may re-open this bug with relevant details (if applicable).

On one hand, there is way too much information and discussion in this bug so the intent isn't very clear. On the other hand there isn't enough information about how/why each of the requested items is relevant and important to add to CLDR supplementary data.


Add a comment

Modify Ticket

as closed
Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.