Question about the Sentence_Break property
richard.wordingham at ntlworld.com
Thu Feb 19 23:14:30 CST 2015
On Thu, 19 Feb 2015 19:55:20 -0700
Karl Williamson <public at khwilliamson.com> wrote:
> UAX 29 says this:
> Break after paragraph separators.
> SB4. Sep | CR | LF
> Why are CR and LF considered to be paragraph separators? NEL and
> Line Break are as well.
> My mental model of plain text has it containing embedded characters,
> which I'll call \n, to allow it to be displayed in a terminal window
> of a given width. Not all text is like that, of course, but there is
> an awful lot that is. This rule makes no sense to me.
There are two types of plain text - that which requires explicit
line-breaking, and that which does not. This is a case where a
non-linguistic tailoring is required.
TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8.
One thing that is missing is mention of the convention that a single
newline character (or CRLF pair) is a line break whereas a doubled
newline character denotes a paragraph break.
More information about the Unicode