From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu May 13 2004 - 15:41:49 CDT
You speak as if date or number formats had nothing to do with language. I very
much disagree. If I have message that says: "The date of the last version of
this document was 2003年3月20日", nobody in their right mind would say that that is
correct English. (More on that at the end of
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/language_code_issues.html,
as I pointed to).
The core of what anyone means by locale is the language -- and that means, in
our context, written language, thus including script (Cryl vs Latn) and variants
(such as US vs UK spelling). The choice of language affects most of what people
traditionally associate with software globalization, including date, time,
number, currency, formatting & parsing; segmentation (words, lines); collation
and searching; resource bundle choice for translated text & appropriate icons,
etc.
So if that is all of what someone means by locale, then there is little point in
distinguishing between "locale IDs" and "language IDs".
There are attributes that are clearly orthogonal to language, like choice of
timezone or choice of currency (not the *formatting* of them, but the *choice*).
So if one's locale definition includes something like: language=sh-Cryl-YU plus
currency=EUR plus timezone=GMT, then that is clearly something far different
than just language.
If that is what someone means by locale, then there one must clearly distinguish
between "locale IDs" and "language IDs". Syntactically, locale IDs may be an
extension of language IDs, since they do form the core. Or one could use some
completely different structure. In CLDR, for example, we use RFC 3066 for the
language part (actually an extension, anticipating RFC 3066bis), but then use an
extension mechanism for additional features that are not captured by language.
Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄
----- Original Message -----
From: "Peter Constable" <petercon@microsoft.com>
To: "Unicode Mailing List" <unicode@unicode.org>
Sent: Thu, 2004 May 13 11:58
Subject: RE: TR35
> > > Moreover, you would never label a document for a
> > > number format in order to determine how automated-formatting
> > > of numbers should be done on the receiving system.
> >
> > You would not label it to determine formatting on the receiving
> system, but
> > to determine interpretation (parsing) of formatted values in the
> received
> > data. You need to know what the convention is to interpret the number
> > 123.456 or the date 02/03/04.
>
> But as I pointed out earlier, you cannot know for certain how to
> interpret it unless you know how it was generated; and if it was entered
> manually by a human, you need to know what they were thinking. A locale
> ID cannot tell you that. A locale ID is useful only if the string that's
> received was generated automatically on the originating system (and you
> know that to be the case), but I'm guessing that most of the time when
> that actually happens, that string is going to be an isolated element
> within a data structure.
>
> It is the case that in a significant number of situations the language
> tag of content will include a region ID, and if I encounter a formatted
> number or date string in the content, I can use that to guess what the
> correct interpretation should be. But I'm not sure I'd want to build a
> system for processing business transactions on such assumptions.
>
>
>
> Peter
>
> Peter Constable
> Globalization Infrastructure and Font Technologies
> Microsoft Windows Division
>
>
>
This archive was generated by hypermail 2.1.5 : Thu May 13 2004 - 15:42:21 CDT