From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Sep 25 2007 - 16:09:36 CDT
Rick McGowan wrote:
> There is also a draft 2 version of the proposed update of UAX#29:Text
> Boundaries. This update addsCR, LF, Extend, and Control as needed,
> clarifies use of "Any" , updates MidLetter to include U+2018, andadds a
> new
> kind of grapheme cluster: extended combining character sequences. See
> http://www.unicode.org/reports/tr29/tr29-12.html
This update removes CR and LF as part of the "Sep" class (in Sentence
boundaries, i.e. Table 4 for Sentence_Break Property values), but when I
look at the document, I don't see any place where "Sep" is not accepted
along with CR and LF, so we see now "Sep | CR |LF" in many "SB*" rules.
What is the interest of this exclusion? It does not seem to change anything
to the intended result (and it does not change the existing rules regarding
the non-breakable sequence CR followed by LF).
Did you forget something in the document where an accepted instance of "Sep"
in some rule would not match either CR or LF? Or is it made for clarity (I'm
not sure that this changes really clarifies anything)?
This archive was generated by hypermail 2.1.5 : Tue Sep 25 2007 - 16:11:05 CDT