RE: New Locale Proposal

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Wed Sep 20 2000 - 12:07:53 EDT


>-----Original Message-----
>From: Keld Jorn Simonsen [mailto:keld@dkuug.dk]
>Sent: Wednesday, September 20, 2000 2:01 AM

>Why don't you just use the notation in ISO/IEC 15897
>- the cultural registry - for this, or the Open Group (UNIX)
>convention? I think there is no need to reinvent the wheel.

When I started working on computers in Brasil in 1960 the ideas of
internationalization were almost non-existent. 8 to 10 years ago these
standards were hot stuff. Now the world is demanding more and we have the
means to deliver better results.

Firstly you need a locale system that is truly mapped to human cultures.
None of this "C" locale or "MACINTOSH". Second you need a system that
allows you to categorize your resources. For example if you want Chinese
Hakka you do not want to respecify the entire Chinese resources but just the
Hakka differences. You can not do that with a standard like 15897. With
15897 each locale must be fully specified. The proposal would allow you to
group sub languages under the mother tongue to save resources.

When these standards came out it was assumed that if you translated into
Spanish that was that. Now you might want to have a primary Spanish and
possibly one or two sub languages. Language like Spanish are not too bad
but Korean, Chinese and Japanese take tremendous resources.

They also did not take into account not only collating tables and the like
but things like word breaking dictionaries. Each language now takes about
64K by reusing resources. Hopefully by using a better locale system and
other techniques, this can be reduced.

Another example. Currently we are debating Turkic languages. The 15897 is
a code page based standard. Our locale being Unicode oriented is devoid of
codepage requirements. Therefore changing Tatar to use a Latin script does
not affect the locale. In fact it does not change any Unicode processing is
as we are discussing you handle the case shifting properly. In fact they
can use a mixture of Cyrillic & Latin scripts. If Turkmen decides to follow
suit and shift to a Latin script the same would apply. Turkmen currently
uses Cyrillic and Arabic scripts.

The locale system is designed for so that users can implement locales with
subtleties that go beyond the 433 ISO 639 languages as well as be platform
independent.

Carl



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT