From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Jun 09 2006 - 10:04:19 CDT
Philippe Verdy wrote on Friday, June 09, 2006 at 7:34 AM
> From: "Mike" <mike-list@pobox.com>
>>> To answer this question, ask yourself what would happen if you
>>> uppercased the string "Straße" this way.
>> I think I would get the right answer, "STRASSE" (if
>> that is the "sharp S" I have learned about a few weeks
>> back).
> Wrong. The case folding of the sharp s is a sharp s. The standard case
> folding does not convert any letter to uppercase.
Is there a 'standard' case folding? There are two default case-foldings,
the simple case-folding, and the full case-folding! The simple case-folding
is as you state - the full case folding is to 'ss'. This results from the
full upper-casing being 'SS', so Mike's answer is correct. Or are you
saying that 'ffrench' should not match case-fold to the same as 'Ffrench'?
(Incidentally, how should we handle the locale specific titlecasing here?
It's a bit more local than simply 'en'!)
> Note that if you compare case insensitively and don't care about other
> variations (at secondary collation level or higher), you can reduce a lot
> the complexity of the algorithm and get much faster result using the
> following:
>
> toLowerCase( toUpperCase(filter(NFKD( string ))) )
>
> where the filter() function eliminates all combinining characters with
> combining class greater than zero.
It's a shame it filters out all the Tibetan vowels.
Richard.
This archive was generated by hypermail 2.1.5 : Fri Jun 09 2006 - 10:13:01 CDT