Default case algorithms
verdy_p at wanadoo.fr
Wed Jun 25 07:37:39 CDT 2014
2014-06-25 10:52 GMT+02:00 Daniel Bünzli <daniel.buenzli at erratique.ch>:
> Le mercredi, 25 juin 2014 à 09:10, Richard Wordingham a écrit :
> > Yes - with the caveat that the uppercase mapping of U+0345 is too
> > complicated to defined formally.
> > On the other hand, the Lowercase_Mapping property seems to be inadequate
> > for the default lowercase mapping - Greek final sigma is the
> > complication.
> So what you seem to imply is that Unicode’s default full casing are
> defined by applying
> 1) The unconditional mappings of SpecialCasing.txt
> 2) The conditional mappings of SpecialCasing.txt (there’s only one, the
> sigma case).
There's also the Turkic i or j (problems related to letters that are
usually soft-dotted in the Latin script except in Turkic languages, whose
case mapping is context-dependant with the right side to see if we need to
add a combining dot above).
We could insist to have Turkish texts using an explicit combining dot above
after the dotless i (or j), biut most Turkish texts just use the plain
ASCII letter, by reinterpreting its soft-dot as a hard dot, that needs to
be added when converting to uppercase, and removed when conertng to
lowercase. Note also that the dotless i or dotless j are not part of any
For Turkish readers, a dotless i followed by an explicit combining dot
above (hard dot) is not recommanded, and they use the standard ASCII letter
directly, as if it was a precombined but decomposable letter. In Turkish
texts, a dotless i without diacritic pairs with a capital ASCII letter I
directly (this mapping to uppercase is *not* contextual,but the reverse
conversion to lowercase *is* contextual).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode