Re: Making sure I read UAX29 correctly

From: Mark Davis ⌛ (mark@macchiato.com)
Date: Thu Aug 27 2009 - 16:50:41 CDT

  • Next message: Mark Davis ⌛: "Unicode 5.2 beta page"

    Because SB5 doesn't match at the start, you keep on going through the rules,
    and finally end up not breaking. I agree that the wording is not as clear as
    it should be.

    I added an example to show that. Hover over the characters and breaks in #15
    in:

    http://unicode.org/~mdavis/SentenceBreakTest-5.2.0d16.html#samples

    Note to all: we can add additional examples that illustrate tricky cases;
    Peter Edberg and Laurentiu Iancu have actions to do so, but we can take
    additional ones if they are sent in very soon. Here are the other current
    samples:

    http://unicode.org/~mdavis/GraphemeBreakTest-5.2.0d16.html#samples (none
    currently)
    http://unicode.org/~mdavis/LineBreakTest-5.2.0d16.html#samples
    http://unicode.org/~mdavis/WordBreakTest-5.2.0d16.html#samples

    Mark

    On Thu, Aug 27, 2009 at 13:32, Eric Muller <emuller@adobe.com> wrote:

    > Eric Muller wrote:
    >
    >
    > When considering whether there is a sentence break according to SB12 (but
    > not
    >
    >
    > -> SB11
    >
    > SB4), is it correct that the Sep, CR and LF have to be understood as
    > "possibly followed by any number of Extend or Format"?
    >
    > In other words, in a string <Sep EX | Numeric>, is the position | a
    > boundary or not?
    >
    >
    > -> <STerm Sep Extend | Numeric>
    >
    >
    > Eric.
    >
    > (PS: yes I am having a hard time with sentence break)
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Aug 27 2009 - 16:53:21 CDT