RE: New Name Registry Using Unicode

From: Carl W. Brown (
Date: Mon Oct 02 2000 - 09:42:15 EDT


>From: []
>Sent: Friday, September 29, 2000 1:26 AM

> wrote:
>> In XNS 1.0, XNS personal, business, and general names all
>> follow the same normalization rules:

>These normalization rules only work for ASCII, so why bother using Unicode?

>After all, they can all keep on using ASCII (cmp.

>> 1.
>> Names can be up to 64 characters of XML text (Unicode 2.0
>> characters as
>> defined by the W3C XML 1.0 specification).

>I think this means that text is normalized by *composition*, right?

>This means that letters with diacritics will be handled as completely
>different from their base letter. This would be a nightmare for languages
>where diacritics have an "optional status". A few example of these funny
>minority languages: English, Arabic, Italian, Hebrew (add also, e.g.,
>and Spanish, if you consider the old deprecated usage of removing accents

>It means that, say, "www.coperate.ut" and "www.cooperate.ut" would be
>considered as different names, which is certainly not what most users want.

>A better choice, IMHO, would be to normalize by *decomposition*. In this
>way, the problem above would be addressed by rule 3 below.

I think you have a very good point. This occurred to me also. The question
I could not answer is what locale do I use? What normalization rules do I

If we can't even do case shifting with out a locale. (The Turkish dotless ?
and dotted ?) How can we decide what is a letter? If = u then is = a.
How about = n?

The problems is that there is no easy solution. It might be part of the
Danes inherent good humor to start and end their alphabet with letter a but
they won't think it is funny to change to ae, to o or to a. Like the
Vietnamese letter is a letter where in most languages the circumflex is an


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT