RE: Same language, two locales (RE: Locale string for Norwegian -

From: Doug Ewell (dewell@compuserve.com)
Date: Sat Sep 02 2000 - 00:20:15 EDT


/|/|ike Ayers <Mike_Ayers@bmc.com> wrote:

> BTW, I've gotten confused during this thread over the naming of
> country codes, etc. There are ISO specs, RFCs, POSIX specs (and
> more?)... Is this information conveniently summarized anywhere so
> that I may enlighten myself?

Here's a convenient, if perhaps oversimplified, summary.

The standard for two-letter language codes is ISO 639-1. There is also
an ISO 639-2 (actually, there are two variants) that specifies three-
letter language codes.

The standard for two-letter country codes is ISO 3166-1, which also
specifies collections of three-letter and numeric country codes. ISO
3166-2 specifies political subdivisions within a country.

RFC 1766 describes a way to use ISO 639-1 and 3166-1 to create language
tags for use on the Internet (e.g. in mail messages). A lowercase 639-1
language tag can be followed by a hyphen and an uppercase 3166-1 country
code to represent the concept of "language X as spoken in country Y."
Unicode Technical Report #7, "Plane 14 Characters for Language Tags,"
recommends a slight adaptation of the RFC 1766 approach (both codes are
lowercase).

RFC 1766 is currently being revised to allow three-letter (639-2), as
well as two-letter (639-1), language codes. This will permit the use
of language tags for hundreds of less-common languages that have no two-
letter code. The revision will also provide ways to use 3166-2 country-
subdivision codes and (draft) ISO 15924 script codes in language tags.

Naturally, the revised version will not be called RFC 1766, but will be
assigned a new number. I don't know if UTR #7 will be updated to refer
to the new RFC when it is published (I think it should be).

POSIX locale names are also formed from 639-1 language codes and 3166-1
country codes. Unlike in RFC 1766, the elements are separated by an
underscore rather than a hyphen. POSIX uses this language/country code
to represent not only the language and local dialect, but all the
attributes of a locale setting, such as decimal separator, thousands
separator, currency symbol, default date format, etc. It is widely
regarded as inadequate for covering even a reasonable subset of locale
possibilities.

There are other standards for language and country codes, but for our
purposes these are by far the most common.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT