That wouldn't work. The process works by taking each offset, and walking
through all the rules, using the first one that matches.
So with your rules and the following input:
RI RI RI RI RI RI
You'd get that any offset with at least 2 RI on the right and on the left
would have no break, and every thing else would have a break, thus:
RI x RI ÷ RI ÷ RI ÷ RI x RI
Mark
On Wed, Jun 22, 2016 at 1:10 PM, Daniel Bünzli <daniel.buenzli_at_erratique.ch>
wrote:
>
> Le mercredi, 22 juin 2016 à 01:32, Laurentiu Iancu a écrit :
> > Re #1, the ^ symbol indeed denotes a start-of-line anchor, in usual
> regex notation, and the corresponding rules could use sot instead.
>
> By the way it seems to me that an equivalent formulation of GB12/GB13 and
> WB15/WB16 would be to have the sequence of rules:
>
> RI RI ÷ RI RI
> RI x RI
>
> This fits particularly well in the case of word breaking since you already
> need as much context as this because of the rules WB{6,7,11,12}. It also
> avoids regexps and negation.
>
> Best,
>
> Daniel
>
>
Received on Wed Jun 22 2016 - 06:33:31 CDT
This archive was generated by hypermail 2.2.0 : Wed Jun 22 2016 - 06:33:32 CDT