RE: Common Locale Data Repository Project

From: Peter Constable (petercon@microsoft.com)
Date: Sat Apr 24 2004 - 09:01:40 EDT

  • Next message: Philippe Verdy: "Re: Common Locale Data Repository Project"

    > From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]

    > What is already unstable in ISO639 is the deprecation of "iw" and the
    > addition
    > of "he", same thing for "in" and "id" or for "yi" and "ji". Don't you
    call
    > that
    > unstability?

    I think there is a misunderstanding here. As I understand it, ISO 639-1
    actually never included "iw", "in" or "ji". But somehow, something got
    published listing those (I don't know those exact details). So there was
    mixed info out there indicating both "iw" and "he", etc. To resolve the
    apparent ambiguity, the ISO 639/RA-JAC had to state that the IDs "iw",
    "in" and "ji" were deprecated.

    > Think more recently about the new codification for Serbo-Croatian, and
    the
    > split
    > of "sh", with no definition except that it is country based (Serbian,
    > Croatian,
    > Bosnian, Montenegrin), assimuming that one country uses only one
    language
    > when
    > in fact there are several in the same one, that are shared by multiple
    > countries, and differ mostly by their script...

    I don't disagree that there aren't some difficult areas, such as this.
    The differences intended by "sr", "bs" and "hr" do *not* have to do with
    script -- i.e. one cannot assume that any of these imply any particular
    script. They also don't imply a particular region (Serbian could be
    spoken outside Serbia), though clearly one country is most likely. They
    *do* imply linguistic differences. Here's the difficulty: in those
    countries, claims are made that there are linguistic differences, so
    much so that it is problematic to sell products there that claim support
    for "Serbo-Croatian". On the other hand, given a document in one of
    these, it's difficult to say that it's specifically one of them and not
    the other two. ISO 639-3 will provide a macro-language identifier for
    "Serbo-Croatian", so it will be possible to tag a document without make
    that distinction.

     
    > Also if ISO3166 is unstable

    I made no claim regarding stability of ISO 3166.

    > Serbia-Montenegro?), then it introduces unstability too within ISO
    3066 or
    > its
    > proposed replacement

    1. It is and IETF specification, not an ISO standard; the designation is
    **RFC** 3066.

    2. The draft successor to RFC 3066 addresses this very issue.

    3. (a bit on the nit-picking side, IMO, but there have been three
    comments on this) RFC 3066 will be *superceded*, not replaced.

    > For now, the only workable solution to solve these issues is found in
    > supplementary libraries in ICU which support locale aliases. (Yes I
    use
    > the
    > terme Locale because this is the term that Java gives to this
    > identification,

    NO. That is the term Java (and other things) give to a *different*
    identification. There are languages, there are cultures/locales. The two
    are not the same.

    Peter Constable



    This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 10:46:22 EDT