Re: What are the issues in having U+FB06 fold to U+FB05?

From: Mark Davis ☕ (mark@macchiato.com)
Date: Wed Jun 08 2011 - 16:33:47 CDT

Next message: Peter Constable: "RE: Character Identity and Font Selection"

Previous message: Andrew West: "Re: Character Identity and Font Selection"
In reply to: Karl Williamson: "What are the issues in having U+FB06 fold to U+FB05?"
Next in thread: Karl Williamson: "Re: What are the issues in having U+FB06 fold to U+FB05?"
Reply: Karl Williamson: "Re: What are the issues in having U+FB06 fold to U+FB05?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

As to the first, it would seem reasonable. The simple folding is not covered
by the following stability policies:

http://www.unicode.org/policies/stability_policy.html#Case_Folding
http://www.unicode.org/policies/stability_policy.html#Case_Pair

However, the committee may be leery of changing these even though they are
not covered by those policies. You can file a request form for the committee
to consider it, at http://unicode.org/reporting.html

The other two are special cases; they casefold together because of the way
that the full case mapping is computed. Their equivalence is normally
captured by a canonical-equivalent folding. Because the simple folding is
only codepoint by codepoint, and only resulting in single code points, they
can't be added.

Mark

*— Il meglio è l’inimico del bene —*

On Sun, Jun 5, 2011 at 08:17, Karl Williamson <public@khwilliamson.com>wrote:

> There are three pairs of characters in Unicode 6.0 in which each member of
> the pair has a full fold to the same sequence, yet there is no simple fold
> relation between them. They are:
>
> U+FB05 LATIN SMALL LIGATURE LONG S T and
> U+FB06 LATIN SMALL LIGATURE ST
> both fold to 'st';
>
> U+0390 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
> U+1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
> both fold to the sequence "U+03B9 U+0308 U+0301" or (the dot standing for
> concatenation)
> GREEK SMALL LETTER IOTA . COMBINING DIAERESIS . COMBINING ACUTE ACCENT
>
> U+03B0 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
> U+1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
> both fold to the sequence "U+03C5 U+0308 U+0301" or
> GREEK SMALL LETTER UPSILON . COMBINING DIAERESIS . COMBINING ACUTE ACCENT
>
> Under full case folding rules, each member of one of these pairs is
> caselessly equivalent to the other member, even without adding NFD rules.
> Correct me if I'm wrong, but shouldn't they also be caselessly equivalent
> under simple folding rules? If so, I'm wondering what issues there would be
> in creating an S rule for these pairs in CaseFolding.txt, so that they would
> be considered caselessly equivalent even for applications that don't do full
> case folding?
>
>
>
>
>
>

Next message: Peter Constable: "RE: Character Identity and Font Selection"
Previous message: Andrew West: "Re: Character Identity and Font Selection"
In reply to: Karl Williamson: "What are the issues in having U+FB06 fold to U+FB05?"
Next in thread: Karl Williamson: "Re: What are the issues in having U+FB06 fold to U+FB05?"
Reply: Karl Williamson: "Re: What are the issues in having U+FB06 fold to U+FB05?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jun 08 2011 - 16:36:35 CDT