From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 15 2003 - 12:09:38 EST
> -----Message d'origine-----
> De : Doug Ewell [mailto:dewell@adelphia.net]
> Envoyé : lundi 15 décembre 2003 17:32
> À : Unicode Mailing List
> Cc : verdy_p@wanadoo.fr
> Objet : Re: Case mapping of dotless lowercase letters
>
>
> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
> > I would have expected to find these mappings:
> >
> > 0130; F; 0069; # LATIN SMALL LETTER DOTLESS I
> > -> LATIN SMALL LETTER I
> > 0130; T; 0130; # LATIN SMALL LETTER DOTLESS I
> > -> LATIN SMALL LETTER DOTLESS I
> >
> > The rationale being that the locale-neutral mappings would not
> > differentiate the "standard" small letter (soft-dotted) i, and the
> > "Turkic" small letter dotless i, for the same reason that they do not
> > differentiate their uppercase versions; and that the "Turkic" mappings
> > should maintain this difference in both lowercase and uppercase pairs
> > of letters.
>
> Turkish and Azeri (and others) can only be cased correctly with
> locale-specific mappings. The locale-neutral mappings cannot be
> expected to consider U+0069 'i' and U+0130 'i' equivalent, with all the
> ambiguities that would bring. As you point out, 'i' and 'i' are quite
> different letters.
I agree with your argument related to the difference between dotted and
dotless letters, except that the current case mappings make a difference of
behavior when comparing uppercase words or lowercase words: a difference is
kept in the case mappings for the lowercase words, which is not kept for the
case mappings of the uppercase words.
The consequence is that two words that compare distinct with case mappings
will no longer compare distinct if they are converted to uppercase with the
default locale-neutral full mappings (this problem does not occur with the
Turkic-specific full case mappings). That's all what I say, and I don't want
to reform the case mappings for Turkic languages, just demonstrate a caveat
for the default locale-neutral mappings.
In practice, I had to add these two mappings in my application because it
caused identity problems (with security concerns) with the default
(locale-neutral) case mappings (the Turkic case mappings are still there as
an option for docs explicitly labelled with "tr" or "az" locales). And the
same is true with IDNA or case-insensitive filesystems, which also must be
made locale-neutral, and thus need to remove the difference between
soft-dotted letters and dotless letters.
Are case foldings under the rules of the stability policy? Could there exist
an "F" addition to the CaseFolding.txt file for default (locale-neutral)
full mappings, and the "T" addition to override it for Turkic languages
where dotless lowercase i will be mapped to itself ?
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Mon Dec 15 2003 - 12:54:55 EST