Re: New Locale Proposal

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Mon Sep 18 2000 - 06:11:33 EDT


I do not know if this proposal is good or evil.
But in any case there are some points that need to be enhanced IMHO.

Carl W. Brown wrote:
>
> The locale will consist of three parts:
>
> 1) A modified lower case RFC 1766bis language
>
> 2) An ISO 3166 country code

Can you allow for areas that are a little bigger ?
The first obvious case is the EU (but I believe it may soon become a
ISO 3166 code). Problematic cases also include the Arabic countries
and the Spanish America, where the unity of language conjugated with
the differences in countries create a long list of almots completely
virtual locales (that is, outside the need to tag monetary amounts,
these locales are non-informative). Same problem for French in
Africa and, to a lesser extend, English on wide areas on Earth.
 
> 3) A variant
 

> The modifications to RFC 1766bis to make to better suited for locales are as
> follows:
>
> 1) Normalize to single form when possible. Use ISO 639-1 code instead of
> 639-2 if one exists.

Are you forced to re-tag every bit of data when ISO 639/RA issues a
new code?

> 3) Variants that are not related to language are locale variants.
> fr_FR_EURO

Can *please* people avoid this abuse of the variant idea?

We are at less than 16 months from the end of the use of FRF. So in
16 months from now, the "fr_FR" locale will become completely
indistinguishable from your example. Unless you want to force us to
leave the "fr_FR" and reserves it for tagging obsolete datas, but
I can tell you this is an already lost battle.
This is a big problem for a draft RFC that will take around,
say, 15 months (;-)), to be completed.

Now, if we try to be a bit more clever, the locale that speaks
French and which labels monetary amounts in euros should be named
"fr_EU", for anything except very peculiar and very rare uses.
There are as much differences between France's French and Belgian
French as between Scottish English and London English (the most
notable being the use of "octante" instead of "quatre-vingt" for
eighty); and I believe the few other similar cases like "de_EU"
for "de_DE"/"de_AT", "nl_EU" for "nl_NL"/"nl_BE", and the perhaps
more future "en_EU"/"en_IE"/"en_GB" or "sv_EU"/"sv_FI"/"sv_SE".
Furthermore, the small countries and alike, as are "LU", "AD",
"SM", "MC" or "VC", for which independant locales will be quite
of jokes (I except "lb_LU"), will then be covered easily.

 
> 5) Convert all non-human locales "C" & "POSIX" to human locales e.g. en_US.

There are BIG differences between "C"/"POSIX" and "en_US".
If you do not see that, then I believe there are big holes
in the intended uses of these new locales.

A major one is that "POSIX" collates in the same order as ASCII;
while I do not believe you are willingful to impose this burden
on every user of "en_US"!
The whole point of "C" and "POSIX" (or its grand'brother "i18n"),
as locales, are to provide surety in execution in an area where
fuzziness is the rule. And yes, there are cases where this is
much more important than displaying user-friendly dates...

Furthermore, I am not sure at all that mapping "C" to "en_US" will
be welcome everywhere (even if C99 now insists that the names
used in full text dates are the English ones). I am not even sure
this is conforming, even assuming the _classical_ "en_US" where
accentuated characters are considered punctuation.
In any ways, the modern, Unicode-conformant, definition of "en_US"
will certainly not qualify.

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT