[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #8029(closed: fixed)

Opened 4 years ago

Last modified 3 years ago

Add language "groups"

Reported by: mark Owned by: mark
Component: xxx-tools Data Locale:
Phase: dsub Review: srl
Weeks: Data Xpath:

Description (last modified by mark) (diff)

We've found that for comparison across languages, it is useful to have groupings like the following. These are roughly aligned with language families, but may deviate where convenient for comparison. And of course could be extended or refined over time.

For example, they could be used in the by-type charts, such as http://www.unicode.org/cldr/charts/26/by_type/date_&_time.gregorian.html, although they'd need to be extended to cover common/main locales.

Language Group Code Name
germanic en English
germanic en-GB English (UK)
germanic af Afrikaans
germanic nl Dutch
germanic de German
germanic da Danish
germanic nb Norwegian Bokmål
germanic sv Swedish
germanic is Icelandic
romance pt Portuguese
romance pt_PT European Portuguese
romance gl Galician
romance es Spanish
romance es_419 Latin American Spanish
romance ca Catalan
romance it Italian
romance ro Romanian
romance fr French
romance fr-CA French (Canada)
slavic hr Croatian
slavic bs Bosnian
slavic sr Serbian
slavic sl Slovenian
slavic cs Czech
slavic sk Slovak
slavic pl Polish
slavic bg Bulgarian
slavic mk Macedonian
slavic ru Russian
slavic uk Ukrainian
baltic lt Lithuanian
baltic lv Latvian
other-indo el Greek
other-indo fa Persian
other-indo hy Armenian
other-indo ka Georgian
other-indo sq Albanian
indic ur Urdu
indic hi Hindi
indic bn Bengali
indic gu Gujarati
indic mr Marathi
indic ne Nepali
indic pa Punjabi
indic si Sinhala
dravidian ta Tamil
dravidian te Telugu
dravidian ml Malayalam
dravidian kn Kannada
cjk zh Chinese
cjk zh_Hant Traditional Chinese
cjk zh-HK Chinese (Hong Kong)
cjk ja Japanese
cjk ko Korean
turkic tr Turkish
turkic az Azerbaijani
turkic kk Kazakh
turkic ky Kyrgyz
turkic uz Uzbek
uralic et Estonian
uralic fi b
uralic hu Hungarian
tai th Thai
tai lo Lao
semitic ar Arabic
semitic he Hebrew
malayic id Indonesian
malayic ms Malay
malayic fil Filipino
austroasiatic vi Vietnamese
austroasiatic km Khmer
other sw Swahili
other zu Zulu
other am Amharic
other eu Basque
other mn Mongolian
other my Burmese


Change History

comment:1 Changed 4 years ago by mark

  • Description modified (diff)
  • Summary changed from Add language "categories" to Add language "groups"

comment:2 Changed 4 years ago by srl

above list is probably derivable from this DBPedia query, with some filtering. I don't think we want to hand curate this list.

Last edited 4 years ago by srl (previous) (diff)

comment:3 Changed 4 years ago by shervin

I think this is a good resource, but we might need to check it. In some cases Wikipedia turns out to be controversial. Also in the result for the query Steven provided above, neither "fa" or "fas" (Persian) can be found. I don't know why is that, since ordering by ?fam or ?iso shows them.

comment:4 Changed 4 years ago by mark

  • Owner changed from anybody to mark
  • Priority changed from assess to minor
  • Status changed from new to accepted
  • Component changed from unknown to tools
  • Milestone changed from UNSCH to 27

Needs to be extended to CLDR main.
Only in our tooling for now.

comment:5 Changed 4 years ago by fossati@…

You are facing a paging issue.
Result sets are limited to 10k records in the official DBpedia SPARQL endpoint.
The query suggested by Steven returns 17,463 records.
You can fix that by firing a 2nd query with an OFFSET value.

comment:6 Changed 4 years ago by fossati@…

Here is a more human-readable version with additional iso codes when available:
1st query
2nd query

comment:8 Changed 4 years ago by fossati@…

Whoops, I forgot to filter non-English language labels.
Here are the updated queries:
1st query
2nd query

comment:9 Changed 4 years ago by mark

  • Keywords working added

comment:10 Changed 3 years ago by mark

  • Status changed from accepted to reviewing
  • Review set to srl

Leaving the extension to more languages to ticket:8208

comment:11 Changed 3 years ago by srl

  • Status changed from reviewing to closed
  • Resolution set to fixed

ok for now


Add a comment

Modify Ticket

as closed
Next status will be 'new'
Next status will be 'closed'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.