Re: Error on Language Codes page.

From: Doug Ewell (doug@ewellic.org)
Date: Sun Feb 01 2009 - 10:07:20 CST

  • Next message: vunzndi@vfemail.net: "Re: Braille, CJK and unicode"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    >> That page continues to trouble me, because of its recommendation to
    >> use ISO 639-1 codes for Hebrew, Indonesian, and Yiddish that were
    >> withdrawn from that standard 20 years ago.
    >
    > These three casesv are not a problem: did you note the asterisk after
    > these codes:

    The text explains the asterisk as identifying the older codes that users
    are being told to prefer over the newer codes.

    > they are also present in ISO 639, and mean deprecated codes.

    Codes that are withdrawn from a standard in the ISO 639 family are not
    still present in the standard. See the official text file provided by
    ISO 639-2/RA at:

    http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt

    or the corresponding HTML versions, such as one sorted by ISO 639-2 code
    at:

    http://www.loc.gov/standards/iso639-2/php/code_list.php

    There are also lists sorted by English or French language name. You
    will not find the withdrawn codes in these lists, or anywhere on the
    RA's official site except on their change page, where they do use both
    "deprecated" and "withdrawn" to refer to these codes, which is
    misleading since these are not synonyms.

    You can certainly find older lists, provided by third parties, that
    differ from the official standard. These lists are available at places
    like:

    http://ftp.ics.uci.edu/pub/ietf/http/related/iso639.txt

    As you can see from the header, the codes in this file were typed in by
    individuals not affiliated with ISO 639, and corrections were made
    whenever someone got around to making them. This file is a snapshot in
    time from 1996; you'll notice that codes added more recently, such as
    'gv' for Manx and 'kw' for Cornish, are not listed. This is why it is
    important for RAs and MAs of such standards to provide freely available,
    electronic access to current copies of their code lists (FTP, gopher, or
    bulletin boards would have been available to ISO 639 before the Web was
    available).

    > The HTML page above correctly gives the current recommanded codes (the
    > other codes with the asterisk are non recommended coded, that are
    > still implicitly aliases that may be supported as they have still not
    > be reassigned to other languages;

    They aren't still supported by the ISO 639 authorities. The reason they
    have not been reassigned is so that *older, existing* data that uses
    these codes can still be interpreted correctly. That is very different
    from encouraging people to continue using these codes going forward.

    > anyway, there will probably be no more alpha-2 code assigned in any
    > part of ISO 639,

    While probably true, this has little or no relevance to the rest of the
    thread.

    > so even if thoise aliases are not recommended, they are still usable
    > by applications that still use them for legacy reasons: Java for
    > example still supports "iw" internally).

    If Java requires the use of the old codes, then the page on the Unicode
    site should specify Java as an application that requires the use of the
    old codes.

    'in' and 'iw' and 'ji' were withdrawn from ISO 639 in 1989. That was a
    *long* time ago in computing. Telling people that they should "write
    the [oldest] one... for legacy applications that cannot manage correctly
    the new standard code or for classes of applications for which you are
    not certain that they can use the new standard," without citing specific
    legacy applications that have this constraint, is like telling people
    that they should continue to use the old Unicode 1.1 Hangul syllables in
    the U+3400 to U+4DFF range instead of the newfangled Unicode 2.0 Hangul
    syllables.

    In other news, I agree that the reference to "Bhutani" should be
    corrected.

    --
    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
    http://www.ewellic.org
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
    


    This archive was generated by hypermail 2.1.5 : Sun Feb 01 2009 - 10:09:58 CST