Re: What are the issues in having U+FB06 fold to U+FB05?

From: Mark Davis ☕ <mark_at_macchiato.com>
Date: Wed, 6 Jul 2011 13:40:12 -0700

Mark
*— Il meglio è l’inimico del bene —*

On Sat, Jun 11, 2011 at 08:04, Karl Williamson <public_at_khwilliamson.com>wrote:

> On 06/08/2011 03:33 PM, Mark Davis ☕ wrote:
>
>> As to the first, it would seem reasonable. The simple folding is not
>> covered by the following stability policies:
>>
>> http://www.unicode.org/**policies/stability_policy.**html#Case_Folding<http://www.unicode.org/policies/stability_policy.html#Case_Folding>
>> http://www.unicode.org/**policies/stability_policy.**html#Case_Pair<http://www.unicode.org/policies/stability_policy.html#Case_Pair>
>>
>> However, the committee may be leery of changing these even though they
>> are not covered by those policies. You can file a request form for the
>> committee to consider it, at http://unicode.org/reporting.**html<http://unicode.org/reporting.html>
>>
>> The other two are special cases; they casefold together because of the
>> way that the full case mapping is computed. Their equivalence is
>> normally captured by a canonical-equivalent folding. Because the simple
>> folding is only codepoint by codepoint, and only resulting in single
>> code points, they can't be added.
>>
>> I didn't understand the sentence above. But would it be fair to say that
> a plausible case could be made for FB06 folding to FB05 simply, but that
> there really shouldn't be a simple fold for the other two cases?
>

Yes, that's what I mean. You can propose all three if you want, via the
reporting form, but I think only #1 is a real possibility (IMO).

>
> Mark
>>
>> /— Il meglio è l’inimico del bene —/
>>
>>
>> On Sun, Jun 5, 2011 at 08:17, Karl Williamson <public_at_khwilliamson.com
>> <mailto:public_at_khwilliamson.**com <public_at_khwilliamson.com>>> wrote:
>>
>> There are three pairs of characters in Unicode 6.0 in which each
>> member of the pair has a full fold to the same sequence, yet there
>> is no simple fold relation between them. They are:
>>
>> U+FB05 LATIN SMALL LIGATURE LONG S T and
>> U+FB06 LATIN SMALL LIGATURE ST
>> both fold to 'st';
>>
>> U+0390 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
>> U+1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
>> both fold to the sequence "U+03B9 U+0308 U+0301" or (the dot
>> standing for concatenation)
>> GREEK SMALL LETTER IOTA . COMBINING DIAERESIS . COMBINING ACUTE ACCENT
>>
>> U+03B0 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
>> U+1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
>> both fold to the sequence "U+03C5 U+0308 U+0301" or
>> GREEK SMALL LETTER UPSILON . COMBINING DIAERESIS . COMBINING ACUTE
>> ACCENT
>>
>> Under full case folding rules, each member of one of these pairs is
>> caselessly equivalent to the other member, even without adding NFD
>> rules. Correct me if I'm wrong, but shouldn't they also be
>> caselessly equivalent under simple folding rules? If so, I'm
>> wondering what issues there would be in creating an S rule for these
>> pairs in CaseFolding.txt, so that they would be considered
>> caselessly equivalent even for applications that don't do full case
>> folding?
>>
>>
>>
>>
>>
>>
>>
>
>
Received on Wed Jul 06 2011 - 15:42:17 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 06 2011 - 15:42:18 CDT