From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 07 2007 - 15:55:20 CST
Rick McGowan wrote:
> The Unicode CLDR committee is planning to release a minor version, 1.5.1,
> by the end of November. There are a few changes in the specificiation
> associated with this change.
>
> http://unicode.org/draft/reports/tr35/tr35.html
> Notable changes include:
> * Added C10. Likely Subtags for locale IDs or language tags.
One problem about a "private use" territory code currently used (QU):
[quote[TR35]]
3. Identifiers
(...)
A locale ID is an extension of a language ID, and thus the structure and
field values are based on [BCP47]. The registry of data for that
successor is now being maintained by IANA. The canonical form of a locale
ID uses "_" instead of the "-" used in [BCP47]; however, implementations
providing APIs for CLDR locale IDs should treat "-" as equivalent to "_"
on input.
(...)
Locale Field Definitions
-------------- ---------- ------------------------------------------
Field Allowable Allowable values
Characters
-------------- ---------- ------------------------------------------
(...)
territory_code ASCII [BCP47] subtag values marked as Type:
letters, region, or any UN M.49 code that doesn't
numbers correspond to a [BCP47] region subtag.
There are three private use codes defined
in LDML:
QO Outlying Oceania
QU European Union
ZZ Unknown or Invalid Territory
The private use codes from XA..XZ will
never be used by CLDR, and are thus safe
for use for other purposes by
applications using CLDR data.
-------------- ---------- -----------------------------------------
[/quote[TR35]]
Now let's look at the normative [BCP-47] reference:
[quote[BCP-47]]
2.2.4. Region Subtags
(...)
The following rules apply to the region subtags:
(...)
2. All two-character subtags following the primary subtag were
defined in the IANA registry according to the assignments found
in [ISO3166-1] ("Codes for the representation of names of
countries and their subdivisions -- Part 1: Country codes") using
the list of alpha-2 country codes, or using assignments
subsequently made by the ISO 3166 maintenance agency or governing
standardization bodies.
[/quote]
Note that [BCP47] cites [ISO3166-1] as a source of codes, but it ***forgets
to list it in the list of normative references*** at end of the document.
It's not very precise about the list being effectively used; it just gives
the name of the whole document within the text itself: "Codes for the
representation of names of countries and their subdivisions -- Part 1:
Country codes", and refers to the "list of alpha-2 country codes"; it speaks
about "assignments", but does not indicate the normative status.
From there, I can find this official page:
http://www.iso.org/iso/iso-3166-1_decoding_table where the "EU" code is in
yellow background described as "exceptional reservations". This links to
this page:
http://www.iso.org/iso/customizing_iso_3166-1.htm, which says:
[quote]
To avoid transitional application problems and to aid users who require
specific additional code elements for the functioning of their coding
systems, the ISO 3166/MA may set aside code elements which it undertakes not
to use for other than specified purposes during a limited or indeterminate
period of time. These are called reserved code elements and their use is
normally restricted to the application they were reserved for.
(...)
Code elements not included in the current version of ISO 3166-1 may be
reserved by the ISO 3166/MA,
* (...)
* as "exceptional reservations", at the request of national ISO member
bodies, governments and international organizations. This applies to certain
code elements required in order to support a particular application, as
specified by the requesting body and limited to such use; any further use of
such code elements is subject to approval by the ISO 3166/MA.
[/quote]
So [BCP47] indicates that the [ISO3166-1] country code "EU", listed in the
list of alpha-2 country code for the European Union, should be used as it
was reserved for indeterminate time. BCP47 does not seem to restrict the use
of alpha-2 codes that were "exceptionally reserved".
For [ISO3166-1], the code "EU" is an exception reservation; its use in LDML
(if it has to become an international standard) would conform to the needed
"support for a particular application". All that is needed is that Unicode
requests approval by the ISO 3166/MA.
Why is LDML using the private use code "QU", apparently in contradiction
with BCP47? Shouldn't it be changed to use "EU" according to BCP47
recommandation and the other policy in LDML that warns against the use of
private use codes that can be changed at any time?
Does Unicode want to request approval by ISO 3166/MA for the use of the "EU"
code in LDML and CLDR (as indicated in ISO3166-1)? I think it would be in
the interest of many applications that already use "EU" in the localization
data, but NOT "QU" because it is a "user-assigned code element" not meant
for interchange.
Note that [ISO3166-1] also says:
[quote]
When exchanging data with users of ISO 3166-1 not connected to this
particular in-house application the definition of these user-assigned code
elements should be given.
[/quote]
This is what is performed in the LDML specification, but is it enough to
permit interchange of data?
This archive was generated by hypermail 2.1.5 : Wed Nov 07 2007 - 17:01:36 CST