Re: Questions about UAX #29

From: Mark Davis ☕ <mark_at_macchiato.com>
Date: Sun, 3 Jul 2011 16:52:45 -0700

Mark
*— Il meglio è l’inimico del bene —*

On Sat, Jul 2, 2011 at 14:58, Karl Williamson <public_at_khwilliamson.com>wrote:

> I have two questions about this.
>
> 1) In UAX #44, it says for information about the Grapheme_Base property, to
> see UAX #29, but that document doesn't mention this property.
>

The documentation on Grapheme_Base in #44 is obsolete. Grapheme_Base has not
been used in the specification of grapheme clusters since (I believe)
Unicode 3.2.

>
> 2) The definition in UAX #29 for both legacy and extended grapheme clusters
> effectively says that any Gc=Cn code points followed by any number of
> grapheme_extend code points is a grapheme cluster. Is that what is meant?
> I notice that Grapheme_Base excludes Cn code points.
>

It doesn't say that. If you had the sequence <Control Extend>, you'd have a
break between them according to the following rule:
GB4.( Control | CR | LF )÷
It would result in two (degenerate) grapheme clusters.

We need to fix the documentation to make this clearer. Could you let me know
what let you to think that "any Gc=Cn code points followed by any number of
grapheme_extend code points is a grapheme cluster" so that we can clarify
that?

Grapheme_Extend, on the other hand, is exactly equivalent to
Grapheme_Cluster_Break=Extend.
Received on Sun Jul 03 2011 - 18:58:03 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 03 2011 - 18:58:08 CDT