From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Wed May 09 2007 - 02:55:38 CDT
Philippe Verdy wrote on Tuesday, May 08, 2007 at 8:57 PM
Subject: RE: Adding Lowercase Letters (was: Uppercase ß is coming? (U+1E9E))
> The special casing rules for turkish do apply to the effect of case 
> mappings
> to lowercase or to uppercase or to titlecase. But do they apply to the 
> case
> folding (which is different from lowercase mapping)?
I can't work out whether the Turkish rules are advisory or mandatory.  The 
tables, unless you count the comments (and they appear to recommend 
non-conformance with subscript iota), are incomplete.  The tables for 
Turkish do not respect canonical equivalence, as the comments caution.  This 
said, CaseFolding.txt does address Turkish case folding.
> I'd like also to find a precise reply to this question:
> Are the strings resulting from a case mapping to uppercase (or to 
> lowercase,
> or to titlecase) required to have the same case folding? Id est:
> Are we guaranteed to have, with existing normative Unicode definitions and
> stability rules, for every string S in a locale L, the following 
> equalities
> starting at some current orpast version of the Unicode standard and in all
> future versions:
>
> toCaseFold(toLowerCase(S, L), L)
> = toCaseFold(toUpperCase(S, L), L)
> = toCaseFold(toTitleCase(S, L), L)
> Are there existing exceptions?
Yes.  U+0131 LATIN SMALL LETTER DOTLESS I lowercases and casefolds to 
itself, but uppercases and titlecases to U+0049 LATIN CAPITAL LETTER I, 
which then casefolds in the default casefolding to U+0069 LATIN SMALL LETTER 
I.
U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE misbehaves similarly (mutatis 
mutandis) in the default simple mappings.
> If so, are they bugs in the UCD to be corrected?
The misbehaviour above is a deliberate choice.
There does not appear to be a formal definition of case-folding for 
Lithuanian.  The procedure for calculating case-folding given in TUS does 
not give perfect results.  It does not really tell you that <U+0069, U+0307> 
should case fold to <U+0069>, and gives no hint on what to do with <U+0049, 
U+0307>.
Richard. 
This archive was generated by hypermail 2.1.5 : Wed May 09 2007 - 02:58:56 CDT