What are the issues in having U+FB06 fold to U+FB05?

From: Karl Williamson (public@khwilliamson.com)
Date: Sun Jun 05 2011 - 10:17:31 CDT

  • Next message: Krishna Birth: "Could you build a linux distro?"

    There are three pairs of characters in Unicode 6.0 in which each member
    of the pair has a full fold to the same sequence, yet there is no simple
    fold relation between them. They are:

    U+FB05 LATIN SMALL LIGATURE LONG S T and
    U+FB06 LATIN SMALL LIGATURE ST
    both fold to 'st';

    U+0390 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
    U+1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
    both fold to the sequence "U+03B9 U+0308 U+0301" or (the dot standing
    for concatenation)
    GREEK SMALL LETTER IOTA . COMBINING DIAERESIS . COMBINING ACUTE ACCENT

    U+03B0 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
    U+1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
    both fold to the sequence "U+03C5 U+0308 U+0301" or
    GREEK SMALL LETTER UPSILON . COMBINING DIAERESIS . COMBINING ACUTE ACCENT

    Under full case folding rules, each member of one of these pairs is
    caselessly equivalent to the other member, even without adding NFD
    rules. Correct me if I'm wrong, but shouldn't they also be caselessly
    equivalent under simple folding rules? If so, I'm wondering what issues
    there would be in creating an S rule for these pairs in CaseFolding.txt,
    so that they would be considered caselessly equivalent even for
    applications that don't do full case folding?



    This archive was generated by hypermail 2.1.5 : Sun Jun 05 2011 - 10:23:01 CDT