[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #9933(closed data: needs-more-info)

Opened 7 months ago

Last modified 2 months ago

Labels to 3166-2 subdivisions

Reported by: doppelbauer@… Owned by: anybody
Component: main Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:
Xref:

Description

The attached list contains label to 3166 subdivisions.
The data is generated from "wikidata.json".
I am not a lawyer - maybe you are able to use wikipedia?

Attachments

wikidata-3166.txt.bz2 (1.2 MB) - added by anonymous 7 months ago.
wikidata-3166-f3.txt.zip (1.6 MB) - added by kent.karlsson14@… 7 months ago.
Decompress, unescape \u-escapes, fix multiple JSON errors, pretty-print, sort on keys, fix certain obvious language tag errors; recompress

Change History

Changed 7 months ago by anonymous

Changed 7 months ago by kent.karlsson14@…

Decompress, unescape \u-escapes, fix multiple JSON errors, pretty-print, sort on keys, fix certain obvious language tag errors; recompress

comment:1 Changed 7 months ago by kent.karlsson14@…

Decompress, unescape \u-escapes, fix multiple JSON errors, pretty-print, sort on keys, fix certain obvious language tag errors (there may be more). The result is attached.

I note that there are issues with the names given in the file (though I have NOT fixed any of them). Here are some issues:

  • Use of ASCII apostrophe in names (should be true apostrophe).
  • Many instances of non-translation.
  • Some questionable translations. For instance Swedish "län" is sometimes transliterated/adapted to another language, though that (probably) results in a word that often does not have any sense in that language. "Län" is also translated to "county" ("grevskap" in Swedish), which really old. It is much better to call them regions, or even provinces, rather than counties. I know, it is common to use "county" for "län", but I do NOT recommend it, it gives the wrong connotations. We haven't had counties for several hundred years (https://sv.wikipedia.org/wiki/Sveriges_grevskap). There is a local-to-Sweden split, in that "regions" are suggested as a new (coarser) division, and maybe some change in responsibilities. But those differences in concept are lost (or losable) in an international context. Indeed, some "län" consider "themselves" "regions" already. [The actual change of division, though, is yet again postponed...; compounding to this is that there are several different top level administrative division, one for the healthcare, another for the police, another for the tax office, etc (https://sv.wikipedia.org/wiki/Regionindelning_f%C3%B6r_Sveriges_myndigheter). So it's a mess...])
  • Sometimes the "län/county/region/prefecture/whatever" indication is lost. Sometimes that makes a difference; e.g. Stockholm (= Stockholms stad) is just a small part if Stockholms län/region/province, Dalarna is an old informal region ("landskap"), that only partially coincides with "Dalarnas län". Similarly, Valencia (which would be the city) is just a part of the Valencia province, but the word for "province" is lost in some translations. Etc., for many other translations.
  • There should be some kind of consistency; e.g. if "province" (similar) is included for some translations it should usually be included for all translations; likewise if it is included for some subdivisions of a country it should usually be included for all subdivisions of that country; likewise for non-inclusion of such a term in the name.
  • Sometimes there are parenthetical remarks that should either be removed or otherwise be made non-parenthetical.
  • Need to check that the divisions given are the current ones, rather than some old or informal division. Also need to check that the local language names are the correct ones.
  • I'm sure there are many more issues.

comment:2 Changed 2 months ago by mark

  • Status changed from new to closed
  • Resolution set to needs-more-info

For the reasons Kent mentioned, we can't take data directly from wikipedia. When we did English, for example, we found many problems in wikiepedia. Much of the data is great, but it required comparison to other sources and many fixes.

If someone is willing to sign up to do that for other languages...

View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.