RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project

From: Peter Constable (petercon@microsoft.com)
Date: Mon Apr 26 2004 - 12:25:08 EDT

Next message: Shawn Steele: "RE: Proposal to add 2 Romanian characters"

Previous message: Michael \(michka\) Kaplan: "Re: [META] Should there be a separate public list for CLDR?"
Next in thread: Michael Everson: "Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project"
Maybe reply: Michael Everson: "Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project"
Reply: Mark Davis: "Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark:

I really feel your usage of terminology here is unhelpful -- in very
practical ways, unhelpful, because it makes it more difficult to get
people to understand how to implement things in the right way.

It may be that the application that most interests you is the naming of
locales, but that does not change the fact that the notions of "locale"
and "language" are different, and that the primary intent of RFC 1766
and it's successors has always been identification of "languages", as
the title and introduction to RFC 3066 indicate:

"Tags for the Identification of Languages"

"One means of indicating the language used is by labeling the
information content with an identifier for the language that is used in
this information content."

Whether in your broad or narrow sense, a locale is an operational mode
of a software application or of a software operating environment to
provide culture-dependent tailoring.

"Language" in the sense used by RFC 1766/3066 is a
linguistically-related attribute of content, and a language identifier
is used to label content to indicate that attribute, or to label
resources (e.g. spelling checkers) that can appropriately be applied to
that content. I think that's stated reasonably clearly in RFC 1766/3066

One should also refer to RFC 2277, IETF Policy on Character Sets and
Languages, which clearly distinguishes "language" tags and "locale"
tags. In the IETF context, which is the context for RFC 1766/3066, those
documents provide do *not* provide tags for locales; they provide tags
for languages.

> There is, as I have said, a perfectly reasonable, narrow sense of
> locale which is essentially identical to what is captured by RFC 3066.

But that does not mean that it's a good thing to refer to RFC 3066 tags
as locale identifiers.

> And in
> practice, RFC 3066 is often used with that meaning. I don't see any
need to deny
> reality (at least not in this area ;-)

I think you overstate actual practice: For many years, various software
implementations have used combinations of ISO 639-1 language identifiers
and ISO 3166 country identifiers joined with an underscore to create
locale identifiers; e.g. "en_US". It was not until Microsoft's .Net
Framework that locales ('CultureInfo' in that context) have been named
using strings that *resemble* RFC 3066 tags -- and it needs to be
pointed out that the namespace for CultureInfo.Name is not the same as
the RFC 3066 namespace.

It may be that you and some others have come to refer to RFC 3066 tags
as "locale" (in some unspecified sense) identifiers, but that
terminology certainly is not used by all. Indeed, as mentioned above, it
is counter to IETF practice as described in RFC 2277.

My contention is that it's unhelpful to refer to RFC 3066 as "locale"
tags. I have no problem with *using* RFC 3066 to name certain locales,
or to control the operational mode of software processes in certain
contexts. But saying that RFC 3066 tags are "locale" tags is decidedly
unhelpful in getting people to understand what are appropriate
requirements of implementations. While you may have a conceptualization
that distinguishes between "narrow" and "broad" senses of "locale",
there are at least some software implementers (and I suspect this
applies to most) that only know of "locale", without any distinction of
subtypes. As a result, people inevitably will end up confusing
namespaces for locales with the RFC 3066 namespace. My concern is that
this will lead to problems of interoperation, and will potentially
undermine RFC 3066.

Consider a couple of situations. First, someone needs to define in their
software a locale for (say) US English but we a 24-hour time format.
Yes, that falls in your broad rather than narrow sense of locale, but
there are lots of software implementers out there that don't know the
difference. All they know is that someone they consider knowledgeable in
i18n/g11n issues has referred to RFC 3066 tags as "locale tags". So,
they decide to name their locale "en-US-24hr". Then they write software,
or document their system leading others to write software, that inserts
this name into contexts like xml:lang. We know they shouldn't do it, but
they don't know that; and referring to RFC 3066 as "locale" tagging only
encouraged them to do this. And once they've done it, it can become a
problem that all of us have to work around.

Secondly, consider Mongolian. Documents written in Mongolian using
Mongolian script should be tagged (following the provisions of RFC
3066bis) as "mn-Mong". There is no distinction to be made between
whether these documents were written in Mongolia or in PRC. Therefore,
there's no need to tag the documents as "mn-Mong-CN" or "mn-Mong-MN".
But for software locales, this country distinction *is* important. So,
if a software implementer names their locale "mn-Mong-MN" and then
assumes they should insert that string into the accept-language header
of an HTTP request, there's a better than fair chance content will not
be returned according to what the user would prefer, because what they
want is "mn-Mong", and that's how the content is tagged, but because the
software implementer didn't understand that the intent of RFC 3066 and
the requirements for locales are not the same, the request that was sent
was overly specific.

So, I will persist in trying to get people to understand that RFC 3066
tags are not "locale" tags, and ask that you not perpetuate confusion
that is out there.

Peter

Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Next message: Shawn Steele: "RE: Proposal to add 2 Romanian characters"
Previous message: Michael \(michka\) Kaplan: "Re: [META] Should there be a separate public list for CLDR?"
Next in thread: Michael Everson: "Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project"
Maybe reply: Michael Everson: "Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project"
Reply: Mark Davis: "Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Apr 26 2004 - 13:10:28 EDT