From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Wed May 09 2007 - 02:55:38 CDT
Philippe Verdy wrote on Tuesday, May 08, 2007 at 8:57 PM
Subject: RE: Adding Lowercase Letters (was: Uppercase ß is coming? (U+1E9E))
> The special casing rules for turkish do apply to the effect of case
> mappings
> to lowercase or to uppercase or to titlecase. But do they apply to the
> case
> folding (which is different from lowercase mapping)?
I can't work out whether the Turkish rules are advisory or mandatory. The
tables, unless you count the comments (and they appear to recommend
non-conformance with subscript iota), are incomplete. The tables for
Turkish do not respect canonical equivalence, as the comments caution. This
said, CaseFolding.txt does address Turkish case folding.
> I'd like also to find a precise reply to this question:
> Are the strings resulting from a case mapping to uppercase (or to
> lowercase,
> or to titlecase) required to have the same case folding? Id est:
> Are we guaranteed to have, with existing normative Unicode definitions and
> stability rules, for every string S in a locale L, the following
> equalities
> starting at some current orpast version of the Unicode standard and in all
> future versions:
>
> toCaseFold(toLowerCase(S, L), L)
> = toCaseFold(toUpperCase(S, L), L)
> = toCaseFold(toTitleCase(S, L), L)
> Are there existing exceptions?
Yes. U+0131 LATIN SMALL LETTER DOTLESS I lowercases and casefolds to
itself, but uppercases and titlecases to U+0049 LATIN CAPITAL LETTER I,
which then casefolds in the default casefolding to U+0069 LATIN SMALL LETTER
I.
U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE misbehaves similarly (mutatis
mutandis) in the default simple mappings.
> If so, are they bugs in the UCD to be corrected?
The misbehaviour above is a deliberate choice.
There does not appear to be a formal definition of case-folding for
Lithuanian. The procedure for calculating case-folding given in TUS does
not give perfect results. It does not really tell you that <U+0069, U+0307>
should case fold to <U+0069>, and gives no hint on what to do with <U+0049,
U+0307>.
Richard.
This archive was generated by hypermail 2.1.5 : Wed May 09 2007 - 02:58:56 CDT