RE: New Name Registry Using Unicode

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Oct 02 2000 - 09:42:15 EDT


Marco,

>From: Marco.Cimarosti@icl.com [mailto:Marco.Cimarosti@icl.com]
>Sent: Friday, September 29, 2000 1:26 AM

>tom@bluesky.org wrote:
>> In XNS 1.0, XNS personal, business, and general names all
>> follow the same normalization rules:

>These normalization rules only work for ASCII, so why bother using Unicode?

>After all, they can all keep on using ASCII (cmp.
>http://www.trigeminal.com/samples/provincial.html).

>> 1.
>> Names can be up to 64 characters of XML text (Unicode 2.0
>> characters as
>> defined by the W3C XML 1.0 specification).

>I think this means that text is normalized by *composition*, right?

>This means that letters with diacritics will be handled as completely
>different from their base letter. This would be a nightmare for languages
>where diacritics have an "optional status". A few example of these funny
>minority languages: English, Arabic, Italian, Hebrew (add also, e.g.,
French
>and Spanish, if you consider the old deprecated usage of removing accents
in
>uppercase).

>It means that, say, "www.coöperate.ut" and "www.cooperate.ut" would be
>considered as different names, which is certainly not what most users want.

>A better choice, IMHO, would be to normalize by *decomposition*. In this
>way, the problem above would be addressed by rule 3 below.

I think you have a very good point. This occurred to me also. The question
I could not answer is what locale do I use? What normalization rules do I
use?

If we can't even do case shifting with out a locale. (The Turkish dotless ?
and dotted ?) How can we decide what is a letter? If ü = u then is å = a.
How about ñ = n?

The problems is that there is no easy solution. It might be part of the
Danes inherent good humor to start and end their alphabet with letter a but
they won't think it is funny to change æ to ae, ø to o or å to a. Like the
Vietnamese letter â is a letter where in most languages the circumflex is an
accent.

Carl



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT