Re: New Locale Proposal

From: Keld Jørn Simonsen (keld@dkuug.dk)
Date: Thu Sep 21 2000 - 06:23:36 EDT


On Wed, Sep 20, 2000 at 07:53:20AM -0800, Carl W. Brown wrote:
> >-----Original Message-----
> >From: Keld Jorn Simonsen [mailto:keld@dkuug.dk]
> >Sent: Wednesday, September 20, 2000 2:01 AM
>
> >Why don't you just use the notation in ISO/IEC 15897
> >- the cultural registry - for this, or the Open Group (UNIX)
> >convention? I think there is no need to reinvent the wheel.
>
> When I started working on computers in Brasil in 1960 the ideas of
> internationalization were almost non-existent. 8 to 10 years ago these
> standards were hot stuff. Now the world is demanding more and we have the
> means to deliver better results.

Boy, that is some time ago. I think the C locale model was invented
around 1990, somewhat later than when you first started on this.

> Firstly you need a locale system that is truly mapped to human cultures.
> None of this "C" locale or "MACINTOSH". Second you need a system that
> allows you to categorize your resources. For example if you want Chinese
> Hakka you do not want to respecify the entire Chinese resources but just the
> Hakka differences. You can not do that with a standard like 15897. With
> 15897 each locale must be fully specified. The proposal would allow you to
> group sub languages under the mother tongue to save resources.

You can specify 15897 locales as copies of other locales, so in effect
this is along what yo are requesting, for your Hakka example you specify
the specifics for Hakka, and then just copy the rest from the Chinese specs.

> They also did not take into account not only collating tables and the like
> but things like word breaking dictionaries. Each language now takes about
> 64K by reusing resources. Hopefully by using a better locale system and
> other techniques, this can be reduced.

You can also specify line breaking stuff with 15897.

> Another example. Currently we are debating Turkic languages. The 15897 is
> a code page based standard. Our locale being Unicode oriented is devoid of
> codepage requirements. Therefore changing Tatar to use a Latin script does
> not affect the locale. In fact it does not change any Unicode processing is
> as we are discussing you handle the case shifting properly. In fact they
> can use a mixture of Cyrillic & Latin scripts. If Turkmen decides to follow
> suit and shift to a Latin script the same would apply. Turkmen currently
> uses Cyrillic and Arabic scripts.

15897 is built on 10646. But it has provisions so that you can also use it
with specific code pages. So it is not only platform independent, but also
encoding independent.

> The locale system is designed for so that users can implement locales with
> subtleties that go beyond the 433 ISO 639 languages as well as be platform
> independent.

That is also what 15897 does, and it is already an ISO standard.

Kind regards
Keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT