Re: UAX 29 9.0.0 new emoji flag rules questions and comments from Mark Davis ☕️ on 2016-06-22 (Unicode Mail List Archive)

From: Mark Davis ☕️ <mark_at_macchiato.com>
Date: Wed, 22 Jun 2016 13:32:43 +0200

That wouldn't work. The process works by taking each offset, and walking
through all the rules, using the first one that matches.

So with your rules and the following input:

RI RI RI RI RI RI

You'd get that any offset with at least 2 RI on the right and on the left
would have no break, and every thing else would have a break, thus:

RI x RI ÷ RI ÷ RI ÷ RI x RI

Mark

On Wed, Jun 22, 2016 at 1:10 PM, Daniel Bünzli <daniel.buenzli_at_erratique.ch>
wrote:

>
> Le mercredi, 22 juin 2016 à 01:32, Laurentiu Iancu a écrit :
> > Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual
> regex notation, and the corresponding rules could use sot instead.
>
> By the way it seems to me that an equivalent formulation of GB12/GB13 and
> WB15/WB16 would be to have the sequence of rules:
>
> RI RI ÷ RI RI
> RI x RI
>
> This fits particularly well in the case of word breaking since you already
> need as much context as this because of the rules WB{6,7,11,12}. It also
> avoids regexps and negation.
>
> Best,
>
> Daniel
>
>
Received on Wed Jun 22 2016 - 06:33:31 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 22 2016 - 06:33:32 CDT