Default case algorithms
daniel.buenzli at erratique.ch
Tue Jun 24 11:46:10 CDT 2014
Le mardi, 24 juin 2014 à 16:07, Markus Scherer a écrit :
> > Does an algorithm that simply applies R1 *regardless of context* constitute a default case algorithm or not ? I.e. does simply mapping each character C in a string using Uppercase_Mapping (C) (e.g. as exposed by the XML UCD) constitute a default case conversion as mandated by the standard ?
> It implements simple uppercasing but not full uppercasing.
Not really, IIUC simple uppercasing would occur if I would use the Simple_Uppercase_Mapping property. I’m using the Uppercase_Mapping property of the XML UCD.
> It misses simple, common things like ß -> SS (which is neither language-dependent nor context-sensitive).
This is actually included in the Uppercase_Mapping property of the XML UCD. Having a look at the data it seems that the Uppercase_Mapping property of UCD includes (using the terminology of SpecialCasing.txt):
* All the unconditional mappings of SpecialCasing.txt (context independent)
* None of the conditional mapping of SpecialCasing.txt (context dependent)
* None of the language sensitive mappings (context and language dependent)
So what am I implementing if I just map a string using XML UCD’s Uppercase_Mapping property ? Is that Unicode’s default uppercase mapping ?
(I did file a bug about that as you suggested, text below for those who are interested)
The default casing algorithms of §3.13 don't really make it clear *if* or *which* context and language dependent case mappings have to be applied in order to implement default case mapping algorithms. Besides the definitions seem to contradict themselves.
1. The Definitions section seems to imply that all case mapping of SpecialCasing.txt and UnicodeData.txt have to be used in order to get the full case mapping properties of a character C.
2. The Tailoring section indicates that the SpecialCasing.txt files contains data to assist implementation of certain *tailorings* of the default case algorithm which contradicts 1.
3. To muddy things further the XML UCD exposes full case mapping properties that as far as I can tell contain only all the context *insensitive* mappings of SpecialCasing.txt
This makes it hard to understand what should be done for implementing proper Unicode default case conversion.
More information about the Unicode