Re: Case mapping of dotless lowercase letters

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Dec 15 2003 - 21:46:54 EST

Next message: Doug Ewell: "Re: Stability of WG2 (was: Re: [OT] CJK -> CJC)"

Previous message: Philippe Verdy: "RE: Latin Capital Reversed K"
In reply to: Philippe Verdy: "RE: Case mapping of dotless lowercase letters"
Next in thread: Philippe Verdy: "RE: Case mapping of dotless lowercase letters"
Reply: Philippe Verdy: "RE: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

>> There may be a problem here, but the urgency seems very slight;
>
> I detected it after it produced a security bug (a user record was
> unexpectedly updated on my database...)
> ...
>> and dotless lowercase i in non-Turkic languages.
>
> Wrong here: I have found occurences of dotless lowercase i, used
> instead of soft-dotted lowercase i, as base letters for diacritics
> added above it (it was an accute accent...)

Don't do that.

> There was two sequences which looked apparently identical when
> rendered, and that were distinct after case folding compare check:
>
> (1) LATIN SMALL LETTER I, COMBINING ACCUTE ACCENT
> (2) LATIN SMALL LETTER DOTLESS I, COMBINING ACCUTE ACCENT
>
> but were no more distinct when converted to uppercase in a locale
> neutral environment not using the Turkic rules:
>
> (1') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT
> (2') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT

OK, so you want the default, local-neutral case mapping tables to equate
U+0069 with U+0131, right?

This is close to being a spoofing problem, though. See TUS 4.0, page
141.

> The string (2) may have been produced to avoid displaying the dot
> with some fonts that don't apply the soft-dotted rule when there's
> an additional diacritic above...

Don't do that. That's misusing the standard. The font should be fixed
instead.

> For me, strings (1) and (2) are "equivalent" in non-Turkic locale-
> neutral environments, and should be equal with case-insensitive
> compares, exactly like for (1') and (2'), their uppercase equivalent.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Doug Ewell: "Re: Stability of WG2 (was: Re: [OT] CJK -> CJC)"
Previous message: Philippe Verdy: "RE: Latin Capital Reversed K"
In reply to: Philippe Verdy: "RE: Case mapping of dotless lowercase letters"
Next in thread: Philippe Verdy: "RE: Case mapping of dotless lowercase letters"
Reply: Philippe Verdy: "RE: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Dec 15 2003 - 22:40:11 EST