CLDR Ticket #5089(closed defect: fixed)
UTS 35 unsuited as a reference for BCP 47 extensions
|Reported by:||norbert||Owned by:||mark|
My perspective is that of the editor of a specification that normatively relies on BCP 47, including the Unicode Locale Extension, but cannot normatively rely on CLDR because some implementors of the specification just can't use CLDR.
From this perspective, UTS 35 is a total mess.
Start with the entry point: RFC 6067, BCP 47 Extension U, points to section 3 of UTS 35. That section is entitled “Unicode Language and Locale Identifiers”. Unicode language and locale identifiers are not BCP 47 language tags; they are traditional ICU locale identifiers with some BCP 47 style enhancements. The section serves primarily to define Unicode identifiers, with various annotations on how they are similar to or different from BCP 47 language tags.
In the midst of this section are two parts that actually seem relevant to RFC 6067:
1) The key/type definitions table. But again, this table is an amalgamation of information relevant to RFC 6067 (references to XML files in the bcp47 directory, keys, types), information relevant only to Unicode identifiers (old key names, old type names), and information whose status is unclear (references to various other sections of UTS 35).
2) Subsection 3.2.1, which defines the canonicalization of Unicode locale extension sequences.
Both of these reference Appendix Q, Unicode BCP 47 Extension Data, which is clearly also relevant to BCP 47, except for the alias attributes, which belong to the world of Unicode locale identifiers and add no value to BCP 47.
Maybe there is more information relevant to the BCP 47 Unicode Locale Extension that I missed. I wouldn't be surprised.
I think it was a mistake to use UTS 35 as the specification for the two BCP 47 extensions - two entirely separate documents should have been created defining them. But splitting UTS 35 into three separate documents now would require updating two RFCs, which is probably more pain than it’s worth.
Therefore, I’d like to propose to restructure section 3 as follows:
- Start with an explanatory statement “This section consists of three parts: Subsection 3.1 specifies the subtags of the BCP 47 Extension U, Unicode Locale, (RFC 6067). Subsection 3.2 specifies the subtags of the BCP 47 Extension T, Transformed Content, (RFC 6497). Subsection 3.3 specifies the Unicode language and locale identifiers used in CLDR, as well as their relationship to BCP 47.”
- Then reorganize the section into the three subsections as described above. Start each subsection again with a statement “This subsection specifies …”.
- Move the key/type definitions table into subsection 3.1, but remove the old key names and old type names, as well as any other material that’s not relevant for BCP 47. Basically, the table should say what keys and types mean, but not how one might implement them using CLDR.
- Create a new table in subsection 3.3 that maps between BCP 47 extension subtags and their old Unicode locale identifier equivalents.
- Move the content of subsection 3.2.1 into the new subsections 3.1 and 3.2 as appropriate.
- Include normative references to Appendix Q in subsections 3.1 and 3.2.
- Remove the alias attributes from the bcp47 files.
- Move any other information related to BCP 47 extensions U and T, except that in Appendix Q, into subsections 3.1 and 3.2.
- Owner changed from anybody to mark
- Priority changed from assess to major
- Status changed from new to assigned
- Milestone changed from UNSCH to 22