The "QU" territory/region code (was New Public Review Issue: #116 Proposed Update UTS #35 LDML)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 07 2007 - 15:55:20 CST

  • Next message: Mark Davis: "Re: The "QU" territory/region code (was New Public Review Issue: #116 Proposed Update UTS #35 LDML)"

    Rick McGowan wrote:
    > The Unicode CLDR committee is planning to release a minor version, 1.5.1,
    > by the end of November. There are a few changes in the specificiation
    > associated with this change.
    >
    > http://unicode.org/draft/reports/tr35/tr35.html
    > Notable changes include:
    > * Added C10. Likely Subtags for locale IDs or language tags.

    One problem about a "private use" territory code currently used (QU):

    [quote[TR35]]
       3. Identifiers
       (...)
       A locale ID is an extension of a language ID, and thus the structure and
       field values are based on [BCP47]. The registry of data for that
       successor is now being maintained by IANA. The canonical form of a locale
       ID uses "_" instead of the "-" used in [BCP47]; however, implementations
       providing APIs for CLDR locale IDs should treat "-" as equivalent to "_"
       on input.
       (...)
                        Locale Field Definitions
       -------------- ---------- ------------------------------------------
       Field Allowable Allowable values
                       Characters
       -------------- ---------- ------------------------------------------
       (...)
       territory_code ASCII [BCP47] subtag values marked as Type:
                       letters, region, or any UN M.49 code that doesn't
                       numbers correspond to a [BCP47] region subtag.
                                   There are three private use codes defined
                                   in LDML:
                                       QO Outlying Oceania
                                       QU European Union
                                       ZZ Unknown or Invalid Territory
                                   The private use codes from XA..XZ will
                                   never be used by CLDR, and are thus safe
                                   for use for other purposes by
                                   applications using CLDR data.
       -------------- ---------- -----------------------------------------
    [/quote[TR35]]

    Now let's look at the normative [BCP-47] reference:

    [quote[BCP-47]]
    2.2.4. Region Subtags
       (...)
       The following rules apply to the region subtags:
       (...)
       2. All two-character subtags following the primary subtag were
           defined in the IANA registry according to the assignments found
           in [ISO3166-1] ("Codes for the representation of names of
           countries and their subdivisions -- Part 1: Country codes") using
           the list of alpha-2 country codes, or using assignments
           subsequently made by the ISO 3166 maintenance agency or governing
           standardization bodies.
    [/quote]

    Note that [BCP47] cites [ISO3166-1] as a source of codes, but it ***forgets
    to list it in the list of normative references*** at end of the document.
    It's not very precise about the list being effectively used; it just gives
    the name of the whole document within the text itself: "Codes for the
    representation of names of countries and their subdivisions -- Part 1:
    Country codes", and refers to the "list of alpha-2 country codes"; it speaks
    about "assignments", but does not indicate the normative status.

    From there, I can find this official page:
    http://www.iso.org/iso/iso-3166-1_decoding_table where the "EU" code is in
    yellow background described as "exceptional reservations". This links to
    this page:
    http://www.iso.org/iso/customizing_iso_3166-1.htm, which says:

    [quote]
    To avoid transitional application problems and to aid users who require
    specific additional code elements for the functioning of their coding
    systems, the ISO 3166/MA may set aside code elements which it undertakes not
    to use for other than specified purposes during a limited or indeterminate
    period of time. These are called reserved code elements and their use is
    normally restricted to the application they were reserved for.
    (...)
    Code elements not included in the current version of ISO 3166-1 may be
    reserved by the ISO 3166/MA,
    * (...)
    * as "exceptional reservations", at the request of national ISO member
    bodies, governments and international organizations. This applies to certain
    code elements required in order to support a particular application, as
    specified by the requesting body and limited to such use; any further use of
    such code elements is subject to approval by the ISO 3166/MA.
    [/quote]

    So [BCP47] indicates that the [ISO3166-1] country code "EU", listed in the
    list of alpha-2 country code for the European Union, should be used as it
    was reserved for indeterminate time. BCP47 does not seem to restrict the use
    of alpha-2 codes that were "exceptionally reserved".

    For [ISO3166-1], the code "EU" is an exception reservation; its use in LDML
    (if it has to become an international standard) would conform to the needed
    "support for a particular application". All that is needed is that Unicode
    requests approval by the ISO 3166/MA.

    Why is LDML using the private use code "QU", apparently in contradiction
    with BCP47? Shouldn't it be changed to use "EU" according to BCP47
    recommandation and the other policy in LDML that warns against the use of
    private use codes that can be changed at any time?

    Does Unicode want to request approval by ISO 3166/MA for the use of the "EU"
    code in LDML and CLDR (as indicated in ISO3166-1)? I think it would be in
    the interest of many applications that already use "EU" in the localization
    data, but NOT "QU" because it is a "user-assigned code element" not meant
    for interchange.

    Note that [ISO3166-1] also says:

    [quote]
    When exchanging data with users of ISO 3166-1 not connected to this
    particular in-house application the definition of these user-assigned code
    elements should be given.
    [/quote]

    This is what is performed in the LDML specification, but is it enough to
    permit interchange of data?



    This archive was generated by hypermail 2.1.5 : Wed Nov 07 2007 - 17:01:36 CST