RE: Question about new locale language tags

From: Michael Maxwell (mmaxwell@casl.umd.edu)
Date: Wed Dec 20 2006 - 08:59:46 CST

  • Next message: Don Osborn: "RE: Linguistics and Unicode"

    Doug Ewell wrote:
    > Normally it's recommended to wait until ISO 639-3 is
    > published and then use those codes instead of the
    > Ethnologue codes (which might not match 100%).

    The 14th edition Ethnologue codes were upper-case, and are still visible at various places (including in searches at ethnologue.org). The 15th edition Ethnologue codes are lower-case, and incorporate some changes to use 639-2 codes where appropriate (as well as other changes that would have happened between editions even in the absence of the whole ISO thing). And of course the Ethnologue codes are a subset of the 639-3 codes.

    > What other "problems" of this sort are supposed to be
    > present in ISO 639-3?

    There's a long list of cases where 639-2 (not 639-3) had a code for something that wasn't a language by a linguistic definition, but rather a group of languages (linguistically motivated or not), or which was vague, at http://www.sil.org/iso639-3/macrolanguages.asp. 'Arabic', for example, is not a single language, but rather a group of things ranging from non-mutually intelligible to maybe mutually intelligible, together with Modern Standard Arabic (MSA), which is no one's native language, but which is understood and spoken by educated people across the region. (MSA is also the only standardized written form of Arabic, which makes it relevant to tagging text. You can find "dialectal" Arabic written, but there is no standard.)

    Description of some of the other problems (and the difficulties inherent in any such classification) are at http://www.sil.org/silewp/2002/SILEWP2002-004.pdf (also at http://unicode.org/notes/tn8/SILEWP2002-004.pdf), and http://unicode.org/notes/tn8/SILEWP2002-003.pdf. (These were written several years ago, and therefore reflect the 639-2 standard.)

    >> ISO 639-3 is based on the Ethnologue codes (with some modifications),
    >> plus codes for long extinct and made-up languages (including
    >> everyone's favorite, Klingon).
    >
    > 1. Encoding extinct languages is a design goal for ISO 639-3,
    > not an error.

    Of course!

    > 2. All languages are "made-up"; they are human inventions and
    > do not occur in nature. Constructed languages such as Esperanto
    > and Ido are also present in ISO 639-1 and -2.

    I could turn this into a long rambling discussion about how non-constructed languages do occur in nature, but I'll refrain :-). (And this point wasn't about 639-1/ -2, but that most constructed languages weren't--and aren't--in the Ethnologue.)

       Mike Maxwell
       CASL/ U MD



    This archive was generated by hypermail 2.1.5 : Wed Dec 20 2006 - 09:01:48 CST