From: Doug Ewell (dewell@adelphia.net)
Date: Mon Dec 15 2003 - 21:46:54 EST
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>> There may be a problem here, but the urgency seems very slight;
>
> I detected it after it produced a security bug (a user record was
> unexpectedly updated on my database...)
> ...
>> and dotless lowercase i in non-Turkic languages.
>
> Wrong here: I have found occurences of dotless lowercase i, used
> instead of soft-dotted lowercase i, as base letters for diacritics
> added above it (it was an accute accent...)
Don't do that.
> There was two sequences which looked apparently identical when
> rendered, and that were distinct after case folding compare check:
>
> (1) LATIN SMALL LETTER I, COMBINING ACCUTE ACCENT
> (2) LATIN SMALL LETTER DOTLESS I, COMBINING ACCUTE ACCENT
>
> but were no more distinct when converted to uppercase in a locale
> neutral environment not using the Turkic rules:
>
> (1') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT
> (2') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT
OK, so you want the default, local-neutral case mapping tables to equate
U+0069 with U+0131, right?
This is close to being a spoofing problem, though. See TUS 4.0, page
141.
> The string (2) may have been produced to avoid displaying the dot
> with some fonts that don't apply the soft-dotted rule when there's
> an additional diacritic above...
Don't do that. That's misusing the standard. The font should be fixed
instead.
> For me, strings (1) and (2) are "equivalent" in non-Turkic locale-
> neutral environments, and should be equal with case-insensitive
> compares, exactly like for (1') and (2'), their uppercase equivalent.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Mon Dec 15 2003 - 22:40:11 EST