From: Karl Williamson (public@khwilliamson.com)
Date: Sun Jun 05 2011 - 10:17:31 CDT
There are three pairs of characters in Unicode 6.0 in which each member
of the pair has a full fold to the same sequence, yet there is no simple
fold relation between them. They are:
U+FB05 LATIN SMALL LIGATURE LONG S T and
U+FB06 LATIN SMALL LIGATURE ST
both fold to 'st';
U+0390 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
U+1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
both fold to the sequence "U+03B9 U+0308 U+0301" or (the dot standing
for concatenation)
GREEK SMALL LETTER IOTA . COMBINING DIAERESIS . COMBINING ACUTE ACCENT
U+03B0 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
U+1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
both fold to the sequence "U+03C5 U+0308 U+0301" or
GREEK SMALL LETTER UPSILON . COMBINING DIAERESIS . COMBINING ACUTE ACCENT
Under full case folding rules, each member of one of these pairs is
caselessly equivalent to the other member, even without adding NFD
rules. Correct me if I'm wrong, but shouldn't they also be caselessly
equivalent under simple folding rules? If so, I'm wondering what issues
there would be in creating an S rule for these pairs in CaseFolding.txt,
so that they would be considered caselessly equivalent even for
applications that don't do full case folding?
This archive was generated by hypermail 2.1.5 : Sun Jun 05 2011 - 10:23:01 CDT