>From: Marco.Cimarosti@icl.com [mailto:Marco.Cimarosti@icl.com]
>Sent: Friday, September 29, 2000 1:26 AM
>> In XNS 1.0, XNS personal, business, and general names all
>> follow the same normalization rules:
>These normalization rules only work for ASCII, so why bother using Unicode?
>After all, they can all keep on using ASCII (cmp.
>> Names can be up to 64 characters of XML text (Unicode 2.0
>> characters as
>> defined by the W3C XML 1.0 specification).
>I think this means that text is normalized by *composition*, right?
>This means that letters with diacritics will be handled as completely
>different from their base letter. This would be a nightmare for languages
>where diacritics have an "optional status". A few example of these funny
>minority languages: English, Arabic, Italian, Hebrew (add also, e.g.,
>and Spanish, if you consider the old deprecated usage of removing accents
>It means that, say, "www.coöperate.ut" and "www.cooperate.ut" would be
>considered as different names, which is certainly not what most users want.
>A better choice, IMHO, would be to normalize by *decomposition*. In this
>way, the problem above would be addressed by rule 3 below.
I think you have a very good point. This occurred to me also. The question
I could not answer is what locale do I use? What normalization rules do I
If we can't even do case shifting with out a locale. (The Turkish dotless ?
and dotted ?) How can we decide what is a letter? If ü = u then is å = a.
How about ñ = n?
The problems is that there is no easy solution. It might be part of the
Danes inherent good humor to start and end their alphabet with letter a but
they won't think it is funny to change æ to ae, ø to o or å to a. Like the
Vietnamese letter â is a letter where in most languages the circumflex is an
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT