From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue Nov 15 2005 - 14:11:02 CST
On Tue, 15 Nov 2005, =?iso-8859-1?Q?Marc Brugui=E8res?= wrote:
>> There is the family name du Roscoät. This is the only example I can come
>> up with.
>
> I fail to understand why is this important for Unicode or truly
> internationalized software?
It is somewhat marginal, and I'm afraid this discussion has largely taken
a wrong path. The first problem should not be the specification of
character collections for different locales - this should take place in
appropriate national and coöperational fora - but to discuss
a) which collections should be present in the CLDR (and perhaps as
required/recommended/purely optional)
b) how these collections are defined: what they mean and how they are
used.
Just knowing which characters appear in a given language is interesting
trivia information. But how will it be used? Different uses may require
different types of collections.
> In real life, French (English, German, Arabic, etc.) texts will contain
> many more characters than those in this list: all kinds of dashes,
> quotes, symbols, not to mention mathematical symbols, texts from other
> scripts, etc.
Indeed. I see no reason to omit punctuation. In fact, punctuation is often
more important than individual letters. When desired, collections limited
to letters can, as needed, be formed from the primary data, e.g. simply by
taking a collection and selecting characters with a General Category value
that indicates a letter.
> I thought Unicode was supposed to open up all characters to us, not
> restrict us to small sets
Well, actually, both. Conformance to the Unicode standard does not require
support to all characters; in fact, an implementation might support a
fairly limited repertoire. But most importantly, the collections of
characters used in different languages help in many ways, e.g. in checking
input data consistency, in selecting (non-Unicode) encodings that would be
feasible, in selecting the fonts that can be used, in designing fonts,
in considering which characters should be easily produced (when designing
keyboard layout or input mechanisms in general), in text scanning, etc.
But the different needs may imply need for different types of information.
I'm afraid the classification to exemplarChars and auxiliary exemplarChars
is too coarse (and the names are misleading).
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Tue Nov 15 2005 - 14:12:08 CST