Re: Public Review Issues update: UAX #31

From: Mark Davis (mark.davis@icu-project.org)
Date: Tue Sep 25 2007 - 18:22:17 CDT

  • Next message: Mark E. Shoulson: "Re: Marks"

    Someone complained that while CR and LF are used in the rules, they were not
    defined in the table of values for Sentence Break. So the UTC decided to
    make this change for consistency. It doesn't make any change in the actual
    outcome. (Personally, I don't think it was of much value, but don't feel
    that strongly against it either.)

    Mark

    On 9/25/07, Philippe Verdy <verdy_p@wanadoo.fr> wrote:
    >
    > In fact after rereading this document carefully, I see why you wanted to
    > change something here for sentence breaks; the initial document did not
    > contain anything saying explicitly that the CRLF sequence was followed by
    > a
    > break.
    >
    > But with the proposed update, this does not change anything. There's
    > currently no need to exclude CR and LF from the "Sep" class, and all
    > changes
    > in the rules starting at SB4 where Sep occurs are not changing anything
    > (given that the rule that excludes a break between CR and LF is SB3, and
    > prior rules SB1 and SB2 (start and end of text) have no impact on CR, LF
    > and
    > "Sep".
    >
    > Really I can't see any rationale for this change, except adding to the
    > confusion of users with different versions of this document. That's why I
    > suspect that something was forgotten, but I really wonder what this can
    > be!
    > For me the rules in SB3 and SB4 and enough and do not require any further
    > change in the following rules where "Sep" occurs, or any exclusion of CR
    > and
    > LF of the "Sep" class.
    >
    > > -----Message d'origine-----
    > > De: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
    > > Envoyé: mardi 25 septembre 2007 23:10
    > > À: 'Rick McGowan'; 'unicode@unicode.org'
    > > Objet: RE: Public Review Issues update: UAX #31
    > >
    > > Rick McGowan wrote:
    > > > There is also a draft 2 version of the proposed update of UAX#29:Text
    > > > Boundaries. This update addsCR, LF, Extend, and Control as needed,
    > > > clarifies use of "Any" , updates MidLetter to include U+2018, andadds
    > a
    > > > new
    > > > kind of grapheme cluster: extended combining character sequences. See
    > > > http://www.unicode.org/reports/tr29/tr29-12.html
    > >
    > > This update removes CR and LF as part of the "Sep" class (in Sentence
    > > boundaries, i.e. Table 4 for Sentence_Break Property values), but when I
    > > look at the document, I don't see any place where "Sep" is not accepted
    > > along with CR and LF, so we see now "Sep | CR |LF" in many "SB*" rules.
    > >
    > > What is the interest of this exclusion? It does not seem to change
    > > anything to the intended result (and it does not change the existing
    > rules
    > > regarding the non-breakable sequence CR followed by LF).
    > >
    > > Did you forget something in the document where an accepted instance of
    > > "Sep" in some rule would not match either CR or LF? Or is it made for
    > > clarity (I'm not sure that this changes really clarifies anything)?
    >
    >
    >
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Tue Sep 25 2007 - 18:24:31 CDT