Re: Case mapping of dotless lowercase letters

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Dec 15 2003 - 21:46:54 EST

  • Next message: Doug Ewell: "Re: Stability of WG2 (was: Re: [OT] CJK -> CJC)"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    >> There may be a problem here, but the urgency seems very slight;
    >
    > I detected it after it produced a security bug (a user record was
    > unexpectedly updated on my database...)
    > ...
    >> and dotless lowercase i in non-Turkic languages.
    >
    > Wrong here: I have found occurences of dotless lowercase i, used
    > instead of soft-dotted lowercase i, as base letters for diacritics
    > added above it (it was an accute accent...)

    Don't do that.

    > There was two sequences which looked apparently identical when
    > rendered, and that were distinct after case folding compare check:
    >
    > (1) LATIN SMALL LETTER I, COMBINING ACCUTE ACCENT
    > (2) LATIN SMALL LETTER DOTLESS I, COMBINING ACCUTE ACCENT
    >
    > but were no more distinct when converted to uppercase in a locale
    > neutral environment not using the Turkic rules:
    >
    > (1') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT
    > (2') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT

    OK, so you want the default, local-neutral case mapping tables to equate
    U+0069 with U+0131, right?

    This is close to being a spoofing problem, though. See TUS 4.0, page
    141.

    > The string (2) may have been produced to avoid displaying the dot
    > with some fonts that don't apply the soft-dotted rule when there's
    > an additional diacritic above...

    Don't do that. That's misusing the standard. The font should be fixed
    instead.

    > For me, strings (1) and (2) are "equivalent" in non-Turkic locale-
    > neutral environments, and should be equal with case-insensitive
    > compares, exactly like for (1') and (2'), their uppercase equivalent.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Mon Dec 15 2003 - 22:40:11 EST