Re: Questions about UAX #29

From: Karl Williamson <public_at_khwilliamson.com>
Date: Mon, 04 Jul 2011 21:17:20 -0600

On 07/03/2011 05:52 PM, Mark Davis ☕ wrote:
>
>
> Mark
> /— Il meglio è l’inimico del bene —/
>
>
> On Sat, Jul 2, 2011 at 14:58, Karl Williamson <public_at_khwilliamson.com
> <mailto:public_at_khwilliamson.com>> wrote:
>
> I have two questions about this.
>
> 1) In UAX #44, it says for information about the Grapheme_Base
> property, to see UAX #29, but that document doesn't mention this
> property.
>
>
> The documentation on Grapheme_Base in #44 is obsolete. Grapheme_Base has
> not been used in the specification of grapheme clusters since (I
> believe) Unicode 3.2.
>
>
> 2) The definition in UAX #29 for both legacy and extended grapheme
> clusters effectively says that any Gc=Cn code points followed by any
> number of grapheme_extend code points is a grapheme cluster. Is
> that what is meant? I notice that Grapheme_Base excludes Cn code
> points.
>
>
> It doesn't say that. If you had the sequence <Control Extend>, you'd
> have a break between them according to the following rule:
> GB4. ( Control | CR | LF ) ÷
>
> It would result in two (degenerate) grapheme clusters.
>
> We need to fix the documentation to make this clearer. Could you let me
> know what let you to think that "any Gc=Cn code points followed by any
> number of grapheme_extend code points is a grapheme cluster" so that we
> can clarify that?

It says that an extended grapheme cluster matches this:
( CRLF
| Prepend* ( Hangul-syllable | !Control )
   ( Grapheme_Extend | Spacing_Mark)*
| . )

That tells me that one option for a grapheme cluster is a !Control
followed by any number of Grapheme_Extends.

Lower down it defines "Control" as
"General_Category = Line Separator (Zl), or
General_Category = Paragraph Separator (Zp), or
General_Category = Control (Cc), or
General_Category = Format (Cf)
and not U+000D CARRIAGE RETURN (CR)
and not U+000A LINE FEED (LF)
and not U+200C ZERO WIDTH NON-JOINER (ZWNJ)
and not U+200D ZERO WIDTH JOINER (ZWJ)"

By that definition of Control, all Gc=Cn code points are !Control.
Therefore a grapheme cluster can be a Cn followed by any number of
Grapheme_Extends
>
> Grapheme_Extend, on the other hand, is exactly equivalent to
> Grapheme_Cluster_Break=Extend.
>
Received on Mon Jul 04 2011 - 22:20:52 CDT

This archive was generated by hypermail 2.2.0 : Mon Jul 04 2011 - 22:20:53 CDT