Unicode top 100 (was RE: Unicode 3.0 press statements)

From: Marco.Cimarosti@icl.com
Date: Fri Jan 21 2000 - 10:16:14 EST


I know, I know: the "top 100" languages list is utter non-sense and surely
does not fit the public relation needs of The Unicode Consortium.

However, as some people took the time to send me corrections and advice, I
tried to integrate them in the list, just for our amusement.

* John Cowan > "Azerbaijan has switched to Latin."
[I moved it]

* Joerg Knappen > Sunda uses Latin; Oromo uses Ethiopic.
[I moved them]

* Roozbeh Pournader > "Sindhi is written in Arabic script."
[I moved it]

* Thomas Chan > "... Other than Mandarin Chinese and Yue Chinese, the
other "Chinese" ones don't really have developed writing traditions, so the
question is sort of academic..."
[See next]

* John Cowan and I > similar concern for Italian dialects.
[I collapsed most "dialects" under the entry of the "national language"
spoken in the area, assuming that speakers of these languages would use the
"national language" in writing (especially on computers)]

* Kent Karsson > "... That does not even cover all of the official
languages of the EU! So that "top 100" statement would be highly
UNimpressive..."
[EU languages are not more important than others; moreover many other
languages are missing. Some of these languages (e.g. Hebrew) are relevant
for Unicode because they use a special script, or are tricky, or are "often
used" on computers, so I sort of added them without estimates]

* Janko Stamenovic > split Serbo-Croatian in Serbian (rough estimate:
8..10 millions) and Croatian.
[The divorce is done: 10 millions to Serbian and the rest to Croatian]

Here are the revised statement and the new list (ordered by writing systems;
the numbers show an estimate of the people speaking each language, in
millions).

"Unicode supports the top 100 languages. Unicode also supports all the
official languages used in the EU and many other languages, some of which
require unique writing systems."

*** Latinate alphabet
332 SPANISH
322 ENGLISH
170 PORTUGUESE
 98 GERMAN
 76 JAVANESE
 72 FRENCH
 68 VIETNAMESE
 59 TURKISH
 46 ITALIAN
 44 POLISH
 31 AZERBAIJANI
 27 SUNDA
 26 ROMANIAN
 24 HAUSA
 20 DUTCH
 20 YORUBA
 18 MALAY (also written in Arabic)
 17 INDONESIAN
 17 IGBO
 17 TAGALOG
 15 HUNGARIAN
 12 CZECH
 11 CROATIAN
  9 MALAGASY
  9 RWANDA
  9 SOMALI
  9 ZULU
  9 SWEDISH
  8 NIGERIAN FULFULDE
  7 HAITIAN CREOLE FRENCH
    (all other official languages in the EU)

*** Greek alphabet
 12 GREEK

*** Cyrillic alphabet
170 RUSSIAN
 41 UKRAINIAN
 18 NORTHERN UZBEK
 10 BELARUSAN
 10 SERBIAN (also written in Latinate)
  9 BULGARIAN
  8 TATAR
  8 KAZAKH
  7 UYGHUR

*** Armenian alphabet
    (ARMENIAN)
  
*** Hebrew alphabet
    (HEBREW)
    (YIDDISH)

*** Arabic alphabet
175 ARABIC (all dialects)
 58 URDU
 31 FARSI
 30 WESTERN PANJABI
 20 SINDHI
 18 PASHTO

*** Thaana alphabet
    (MALDIVIAN)
  
*** Devanagari alphabet
182 HINDI
 65 MARATHI
 16 NEPALI

*** Bengali alphabet
189 BENGALI
 14 ASSAMESE

*** Gujarati alphabet
 44 GUJARATI

*** Gurmukhi alphabet
 26 EASTERN PANJABI

*** Oriya alphabet
 31 ORIYA

*** Tamil alphabet
 63 TAMIL

*** Telugu alphabet
 66 TELUGU

*** Kannada alphabet
 34 KANNADA

*** Malayalam alphabet
 34 MALAYALAM

*** Sinhala alphabet
 13 SINHALA

*** Thai alphabet
 35 THAI

*** Lao alphabet
    (LAO)

*** Myanmar alphabet
 22 BURMESE

*** Georgian alphabet
    (GEORGIAN)

*** Hangul script
 75 KOREAN (also uses CJK ideographs, a.k.a. hanja)

*** Ethiopic script
 17 AMHARIC
  9 OROMO

*** Cherokee script
    (CHEROKEE)

*** Canadian syllabic script
    (INUIT)

*** Khmer alphabet
  7 KHMER

*** Mongolian alphabet
    (MONGOLIAN)

*** Braille patterns
    (many languages worldwide)

*** Kana script
125 JAPANESE (also uses CJK ideographs, a.k.a. kanji)

*** CJK ideographs (a.k.a. hanzi, kanji, hanja)
885 MANDARIN CHINESE
 66 YUE CHINESE
282 (other Chinese dialects)

*** Yi script
    (YI)

*** Unknown (unwritten?)
 25 BHOJPURI
 24 MAITHILI
 21 AWADHI
 15 SARAIKI
 15 CEBUANO
 14 CHITTAGONIAN
 14 MADURA
 13 HARYANVI
 12 MARWARI
 12 MAGAHI
 11 CHHATTISGARHI
 10 DECCAN
  8 ILOCANO
  7 SHONA
  7 KURMANJI
  7 HILIGAYNON
  7 AKAN

THE END

Ciao.
        Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT