CLDR Ticket #9000(accepted data)
Add subdivision names in more languages
|Reported by:||mark||Owned by:||mark|
The goal is not complete coverage, but rather to cover the subdivisions in the main languages of each country, plus other subdivisions where the data is relatively easy to extract.
Here's a draft process, but this will need development and refinement as we go along.
- Start with a goal of at least one official (de jure or de facto) language for each country. However, based on availability and quality of data, that scope could be expanded (eg adding the names of German Länder in Russian). Initially limit to only "modern coverage" CLDR languages.
- Document the process used to clean up the English names (techniques for resolving conflicts, producing more customary names: eg "State of California" => "California") so that translators have something to start with.
- Extract native language subdivision names for subdivisions based on Wikipedia data and/or other sources.
- Produce spreadsheets for each language listing the subdivision code, English name, and possible native names, maybe also wikipedia links.
- Distributed these to translators for verification, probably in online-spreadsheet form.
- Process the resulting data into XML format. Residual conflicts are sent back to translators for review.
Note: we've found it best to do (e) and (f) for one or two languages first, to verify that the process works, before opening it up to more languages.
- Status changed from new to accepted
- Component changed from unknown to other
- Priority changed from assess to medium
- Phase changed from dsub to final
- Milestone changed from UNSCH to 29
- Owner changed from anybody to mark
- Type changed from unknown to data