I did a little homework on this topic and made some discoveries.
First, a colleague of mine at work checked with his friend, the ANSI
representative to ISO TC212, who claimed that all references to ISO
standards in standards-track documents are to the most current version
of that standard. I don't know if he meant in *any* standards-track
documents or only in other ISO standards, but the point was that nobody
is supposed to be forced to follow ISO standards that have been
superseded.
Second, and more to the point, I poked around the Everson Gunn Teoranta
site for a while -- amazing, the things you find in there! -- and
discovered that an Internet Draft revision of RFC 1766 is in the works.
(Look for "draft-alvestrand-lang-tags-v2-01.txt" at IETF or other fine
FTP sites.) There is lots of new stuff to read: you will find, for
instance, that ISO 639-2 three-letter language codes, as well as ISO
639(-1) two-letter language codes, will soon be allowed in language
tags, and that these tags will also support ISO 3166-2 region codes and
ISO/DIS 15924 script codes (presumably after 15924 gets out of the DIS
stage).
This is also where I learned the rather surprising news that the ISO
639(-1) list of two-letter codes will soon be frozen, meaning in
particular that no new two-letter codes will be assigned to languages
that already have a three-letter code. This means that some major,
significant languages like Turkish and Yoruba will never get two-letter
codes, which seems odd somehow. Fortunately, the expansion of this I-D
to include ISO 639-2 means that it can finally be used to encode Turkish
et al. after all.
What is most relevant to this discussion, though, is the way the I-D
handles the issue of updated ISO standards. Note the following wording
from the I-D:
> All 2-letter tags are interpreted according to *ISO 639:1988*, "Code
> for the representation of names of languages" [ISO 639] *and
> subsequent additions made by its Registration Authority*.
>
> *Note: this is currently under revision as ISO/DIS 639-1:2000.*
>
> All 3-letter tags are interpreted according to *ISO 639-2:1998*,
> "Code for the representation of names of languages -- Part 2: Alpha-3
> code" [ISO 639-2] *and subsequent additions made by its Registration
> Authority*.
(all emphasis original)
This means Alvestrand and colleagues recognized the ambiguity of
specifying ISO 639:1988 in the original RFC 1766 and have explicitly
permitted codes from updated lists in the new I-D.
Of course, the real hazards come whenever a standard (XML specification,
Unicode Technical Report, or whatever) relies on an Internet RFC. For
one thing, an RFC is not "updated"; rather, it is "obsoleted" by a new
RFC with a new number. If your standard or spec references an RFC, it
is outdated as soon as the referenced RFC is replaced. How many
documents are still out there that claim that MIME is defined by RFCs
1521 and 1522? It was, once upon a time, but those RFCs were replaced
almost four years ago by RFCs 2045 and 2046.
Another problem is that RFCs are not necessarily written with the same
attention to detail, precision, and completeness as ISO or national
standards. Some are written very well indeed, but there are no
guarantees. The present problem with imprecise wording in RFC 1766 is
evidence of this.
The mere fact that a document exists as an RFC should mean little.
Remember that RFCs used to be written to announce the postponenemt of
local meetings due to schedule conflicts and such (and of course there
are the famous "joke" RFCs like "ARPAWOCKY," which are nonetheless part
of the "official" RFC series right alongside RFCs 2152, 2279, and our
friend 1766).
Eric Raymond writes glowingly in the Jargon File about the advantage
that RFCs have over the "more formal, committee-driven process" of ISO
and national standards, but the flip side of that coin is that the less
formal RFC process sometimes results in documents with holes and
implicit assumptions of the kind that drive me, Mike Brown, and probably
many more of you crazy.
Reading RFC 1766, the new I-D that is destined to replace 1766 in the
near future, and UTR #7 all leads me to conclude that UTRs should
*definitely* be revised to refer directly to ISO standards whenever
possible, instead of Internet RFCs, even if the idea comes from an RFC
and significant wording must be cut and pasted from an RFC. If that
happens, then the UTC (which we know is up to the task) can take the
responsibility for updates and clarifications, so that ambiguities of
the type Mike has been experiencing with the XML spec will not plague
implementors of the Unicode Standard.
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT