[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #9081(accepted data)

Opened 18 months ago

Last modified 14 months ago

"Svalbard" and "Jan Mayen" subdivisions of Norway

Reported by: Matsbla <mats.gbproject@…> Owned by: mark
Component: supplemental Data Locale:
Phase: rc Review:
Weeks: Data Xpath:
Xref:

ticket:9079

Description

Check out this page. Some subdivisions, like "Hong Kong" also have a territory code. They are marked, like CN-91 = HK (Hong Kong SAR China):
http://www.unicode.org/cldr/charts/latest/supplemental/territory_subdivisions.html

However, I can't find this type of mapping in the cldr core, are they there? If not it would be great to have them there!

I noticed that Norway have "Svalbard" and "Jan Mayen" listed as subdivisions. They also have their own territory codes, but this is not marked on the page i linked to.

Attachments

Change History

comment:1 Changed 17 months ago by srl

  • Data Xpath set to 9079

Similarly CP is a subregion of FR see ticket:9079

comment:2 Changed 17 months ago by srl

  • Idea: have NO-SJ under NO, and list NO-21 and NO-22 as sub-subdivisions of NO-SJ

comment:3 follow-up: ↓ 4 Changed 17 months ago by Matsbla <mats.gbproject@…>

no, "SJ" is a valid ISO code, it can work without NO in front. The change you suggest will make CLDR less coherent with all other platforms using ISO codes...

It is not really a problem that areas have both a territory codes and a sub-division codes, just that it could be a good idea to map them together, to be able to keep overview of them.

comment:4 in reply to: ↑ 3 Changed 17 months ago by srl

Replying to Matsbla <mats.gbproject@…>:

no, "SJ" is a valid ISO code, it can work without NO in front.

Right. I was not proposing to makeSJ invalid by itself.

The change you suggest will make CLDR less coherent with all other platforms using ISO codes...

It is not really a problem that areas have both a territory codes and a sub-division codes, just that it could be a good idea to map them together, to be able to keep overview of them.

My proposal was this, conceptually (using bullet items to show hierarchy)

  • SJ Svalbard and Jan Mayen = NO-SJ
  • NO Norway
    • NO-03 Oslo
    • NO-SJ Svalbard and Jan Mayen
      • NO-21 Svalbard
      • NO-22 Jan Mayen
    • NO-01 Østfold

probably something like:

<subgroup type="NO" contains="04 12 14 22 02 03 05 07 20 01 10 19 08 06 09 16 17 21 18 11 15 SJ"/>
<subgroup type="NO" subtype="SJ" contains="21 22"/>

but as you noted, there doesn't seem to be structure to indicate the alias relationship here.

comment:5 Changed 17 months ago by emmons

  • Status changed from new to accepted
  • Data Xpath 9079 deleted
  • Xref set to 9079
  • Priority changed from assess to medium
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to upcoming
  • Owner changed from anybody to mark
  • Type changed from unknown to data

comment:6 Changed 17 months ago by mark

  • Cc verdyp@… added

The problem is that 3166-2 doesn't always mark which subdivisions are equivalent to country codes, so we have to do some research to compensate for that deficiency. Will try to do that for this release, but might not have time to do that research. Help would be great.

comment:7 Changed 17 months ago by verdyp@…

That research is not long to do for the few territories listed in ISO 3166-1. Those that are independant self-governing and recognized countries are wellknown, and there are few codes to look for in ISO 3166-1 for which it is obvious from many sources that they are also part of another country, whose data in ISO 3166-2 lists the same territories.
Not a lot of countries have dependencies listed separately in ISO 3166-1 : GB (but its dependencies, as well as Crown dependencies are officially not part of GB so they are not in ISO 3166-2), FR, NL, US, CN, ES, DK, NO. All we have to do is to check these few countries to see if they have codes also listed in ISO 3166-2 matching the same territory. Those should also be aliased in the IANA database of language subtaggs for BCP 47 (and tickets opened there too).

comment:8 Changed 16 months ago by Matsbla <mats.gbproject@…>

Hi, I tried to look over, and I found this overview:
https://en.wikipedia.org/wiki/ISO_3166-2#Subdivisions_included_in_ISO_3166-1

In addition Serbia have listed 5 districts that covers the areas of Kosovo [XS]:
Kosovsko-Pomoravski okrug [RS-29]
Kosovsko-Mitrovački okrug [RS-28]
Prizrenski okrug [RS-27]
Pećki okrug [RS-26]
Kosovski okrug [RS-25]

Those should also be aliased in the IANA database of language subtaggs for BCP 47 (and tickets opened there too).

Do IANA database of language subtags contain information about subdivisions?

comment:9 Changed 14 months ago by doug@…

Do IANA database of language subtags contain information about subdivisions?

No, subdivisions are out of scope for the Language Subtag Registry. You can use the -u- extension together with the CLDR subdivision mechanism to meet this need, when genuine language differences exist.

comment:10 Changed 14 months ago by doug@…

Those should also be aliased in the IANA database of language subtaggs for BCP 47 (and tickets opened there too).

Duplicate coding in the Language Subtag Registry causes matching problems and provides virtually no benefit. It is avoided whenever possible.

comment:11 Changed 14 months ago by verdy_p@…

It is NOT avoided whenever possible, given that they already exist as *aliases* and they all indicate the replaced form.

BCP47 subtags have really plenty of aliases, this is part of the standard itself, and also a major reason of its stability (as opposed to ISO codes for languages or regions).

BCP 47 fully describes how these aliases work, and the relevant properties in the IANA database ("prefered" tags/subtags are not really aliases given there may exist some ambiguities for their replacement, but "replaced" tags/subtags are really defining aliases and indicate a canonical form, so that no other data is needed using aliases, or so that data tagged with those aliases can be found even when using the replaced tag/subtag).

Unicode itself also has its own aliases (e.g. for scripts), many of them are inherited (but not all, notably for scripts where they have been added directly). CLDR itself adds its own aliases. They also ensure compatibility and stability of some rules (or with some assumptions made by past implementations).

Without those aliases defined, data that already exist and that use *valid* BCP 47 tags but not their canonical equivalents will not be found when looking up data using only the canonical tags/subtags. With those aliases defined, at least they can be used as first fallbacks for missing data with the canonical tags.

Aliases are part of the necessary fallback mechanism (and storing data with these fallbacks is not necessary if they can be directly stored by putting values in the new replaced tags without conflicts).

There's no need to open new tickets in the IANA database given that these codes already exist there. But tickets may be necessary to add missing "Replaced:" or "Preferred:" properties.

comment:12 Changed 14 months ago by doug@…

I welcome the opportunity to correct factual errors and misunderstanding about BCP 47 via private email, instead of as comments to this unrelated CLDR ticket.

View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.