Re: Potential contradiction between the WordBreak test data and UAX #29

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Wed, 23 Nov 2016 12:20:44 +0100

2016-11-23 12:00 GMT+01:00 Tom Hacohen <tom_at_osg.samsung.com>:

>
> Also take another look at http://www.unicode.org/reports
> /tr29/#Grapheme_Cluster_and_Format_Rules specifically the table that
> shows another way of writing the ignore rule. This again shows my
> understanding of rule 4 is correct.
>
> Specially look at the following equivalence:
> X Y × Z W ⇒ X (Extend | Format)* Y (Extend | Format)* × Z
> (Extend | Format)* W
>

This expansion does not occur before rule WB4; it cannot be used to
transform rules WB1 to WB3c; this is explicitly stated in the algorithm.
And because the rule WB3c handles your case, you are misinterpreting the
specs as if it was applying there too...
Received on Wed Nov 23 2016 - 05:21:20 CST

This archive was generated by hypermail 2.2.0 : Wed Nov 23 2016 - 05:21:20 CST