From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Thu Oct 19 2006 - 17:59:43 CST
Andrew Miller wrote on Thursday, October 19, 2006 11:44 PM
> There appear to be a number of differences in the case mappings defined in
> UnicodeData.txt and SpecialCasing.txt
> Can I just ignore the UnicodeData.txt mappings for these characters, and
> just use the ones defined in SpecialCasing ones instead?
Just using UnicodeData.txt gives one the 'simple default case mappings';
overriding it with SpecialCasing.txt gives one the 'full default case
mappings' (TUS 4.0 Section 3.13).
To keep things 'simple', a process performing a 'full default case mapping'
is not a Unicode-conformant process in the sense of TUS 4.0 Section 3.2 C9.
(I gave bad advice in the past because I thought it would be.) The default
case mappings work on strings of characters, not strings of characters
modulo canonical equivalence, and only work properly if the ypogegrammeni
containing character is not followed by a character of lesser non-zero
combining class. (Currently - Unicode 5.0 - all other combining characters
have lesser non-zero combining class, but this may well not be true in
Unicode 5.1.)
Note that U+0131 LATIN SMALL LETTER DOTLESS I does not case-fold with
anything (except in the 'standard' Turkish customisation) - I'm told this
peculiar behaviour will be documented in TUS 5.0.
The Lithuanian and Turkish case mappings only work well for these languages
and those like them - unusual accents on 'i's will cause total confusion.
Richard.
This archive was generated by hypermail 2.1.5 : Thu Oct 19 2006 - 18:01:48 CST