I am working on a new locale proposal. This seems like a great time to jump
in. Un fortunately I have not been able to pull the mail archives to go
back over the latest discussions so I will have missed much.
The locale will consist of three parts:
1) A modified lower case RFC 1766bis language
2) An ISO 3166 country code
3) A variant
The three parts are separates with underscores to distinguish the '-'
separators within each of the parts.
From RFC 1766bis
The primary tag must be:
An ISO 639 2-letter language code
An ISO 639-2 3-letter language code
i-
x-
The first subtag when following a 2-letter or 3-letter code is
distinguished as follows:
If 2-letter, it is an ISO 3166-1 country code
If 3-letter, it is an ISO 639-2 language code
If 4-letter, it is an ISO/DIS 15924 script code
If 5-8 letters, it may be of any value
The first subtag when following I- or x- may have 1-8 letters and
represent any value.
The second subtag is distinguished as follows:
If 2-letter, it is an ISO 3166-2 region code
If 3-letter, it is an ISO 639-2 language code
If 4-letter, it is an ISO/DIS 15924 script code
If 5-8 letters, it may be of any value
Subsequent subtags may have any value.
The modifications to RFC 1766bis to make to better suited for locales are as
follows:
1) Normalize to single form when possible. Use ISO 639-1 code instead of
639-2 if one exists. (eng-us -> en_US) Use single language designation
rather than language/variant e.g. (no-nynorsk -> nn_NO) Replace obsolete
values with new codes. (jw -> he) Replace i-codes with SIL codes except
klingon.
2) Country codes that are part of RFC1766 become locale country codes.
en-us -> en_US
3) Variants that are not related to language are locale variants.
fr_FR_EURO
4) Break the "Applications should always treat a language tag as a single
token" rule by specifying to forms of RFC1766 languages. An ISO 639-1 or
639-2 or a 639-1/639-2 sub language pair. In the case of the 639-1/639-2
pair the program will iterate its resources. e.g. zh-hak_CN You certainly
do not want to replicate the entire Chinese language resources for Hakka.
Taiwanese Hakka would end up being zh-hak_TW. Total replacement 639-2
languages are specified without a sub language like haw_US.
5) Convert all non-human locales "C" & "POSIX" to human locales e.g. en_US.
Carl
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT