Re: 9.0.0 segmentation and line breaks on the empty string

From: Andy Heninger <andy.heninger_at_gmail.com>
Date: Mon, 20 Jun 2016 15:32:12 -0700

>
> I notice that in 9.0.0, UAX29 segmentations no longer report boundaries on
> the empty string while UAX14 still does

This is an interesting edge case.

My reading of UAX 14 is that an empty string would not produce a break.
Both "sot" and "eot" would be true, so LB2,
    sot ×
would match and apply, and that would be the end of the story. LB3 would
never be applied because LB2 would match first.

As to mandating a hard break at the end of text (LB3), I'm not at all sure
this was a good idea. It seems like the breaking behavior would depend on
the external context of the text, about which the LB algorithm knows
nothing. It's different from having text that ends ends with a LF or other
hard-break character. But I'm also disinclined to suggest changes in this
area; the possibility of breaking applications that have come to expect the
existing behavior seems real, and it's all edge cases.

  -- Andy

On Sun, Jun 19, 2016 at 9:34 AM, Daniel Bünzli <daniel.buenzli_at_erratique.ch>
wrote:

> Le dimanche, 19 juin 2016 à 16:57, Karl Williamson a écrit :
> > Yes. Use http://www.unicode.org/reporting.html to make an error report.
>
> Thanks, did that.
>
> Best,
>
> Daniel
>
>
>
>
Received on Mon Jun 20 2016 - 17:34:02 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 20 2016 - 17:34:03 CDT