RE: Case mapping of dotless lowercase letters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 15 2003 - 12:09:38 EST

  • Next message: Arcane Jill: "RE: Case mapping of dotless lowercase letters"

    > -----Message d'origine-----
    > De : Doug Ewell [mailto:dewell@adelphia.net]
    > Envoyé : lundi 15 décembre 2003 17:32
    > À : Unicode Mailing List
    > Cc : verdy_p@wanadoo.fr
    > Objet : Re: Case mapping of dotless lowercase letters
    >
    >
    > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    >
    > > I would have expected to find these mappings:
    > >
    > > 0130; F; 0069; # LATIN SMALL LETTER DOTLESS I
    > > -> LATIN SMALL LETTER I
    > > 0130; T; 0130; # LATIN SMALL LETTER DOTLESS I
    > > -> LATIN SMALL LETTER DOTLESS I
    > >
    > > The rationale being that the locale-neutral mappings would not
    > > differentiate the "standard" small letter (soft-dotted) i, and the
    > > "Turkic" small letter dotless i, for the same reason that they do not
    > > differentiate their uppercase versions; and that the "Turkic" mappings
    > > should maintain this difference in both lowercase and uppercase pairs
    > > of letters.
    >
    > Turkish and Azeri (and others) can only be cased correctly with
    > locale-specific mappings. The locale-neutral mappings cannot be
    > expected to consider U+0069 'i' and U+0130 'i' equivalent, with all the
    > ambiguities that would bring. As you point out, 'i' and 'i' are quite
    > different letters.

    I agree with your argument related to the difference between dotted and
    dotless letters, except that the current case mappings make a difference of
    behavior when comparing uppercase words or lowercase words: a difference is
    kept in the case mappings for the lowercase words, which is not kept for the
    case mappings of the uppercase words.

    The consequence is that two words that compare distinct with case mappings
    will no longer compare distinct if they are converted to uppercase with the
    default locale-neutral full mappings (this problem does not occur with the
    Turkic-specific full case mappings). That's all what I say, and I don't want
    to reform the case mappings for Turkic languages, just demonstrate a caveat
    for the default locale-neutral mappings.

    In practice, I had to add these two mappings in my application because it
    caused identity problems (with security concerns) with the default
    (locale-neutral) case mappings (the Turkic case mappings are still there as
    an option for docs explicitly labelled with "tr" or "az" locales). And the
    same is true with IDNA or case-insensitive filesystems, which also must be
    made locale-neutral, and thus need to remove the difference between
    soft-dotted letters and dotless letters.

    Are case foldings under the rules of the stability policy? Could there exist
    an "F" addition to the CaseFolding.txt file for default (locale-neutral)
    full mappings, and the "T" addition to override it for Turkic languages
    where dotless lowercase i will be mapped to itself ?

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Mon Dec 15 2003 - 12:54:55 EST