When UAX9 mentions a paragraph level, it says:
> Paragraphs are divided by the Paragraph Separator or appropriate Newline
Function (for guidelines on the handling of CR, LF, and CRLF, see *Section
4.4, Directionality*, and *Section 5.8, Newline Guidelines* of [Unicode
<http://www.unicode.org/reports/tr41/tr41-15.html#Unicode>]). Paragraphs
may also be determined by higher-level protocols: for example, the text in
two different cells of a table will be in different paragraphs.
Regards,
Konstantin
2015-02-21 3:56 GMT+04:00 Philippe Verdy <verdy_p_at_wanadoo.fr>:
> 2015-02-20 6:14 GMT+01:00 Richard Wordingham <
> richard.wordingham_at_ntlworld.com>:
>
>> TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8.
>> One thing that is missing is mention of the convention that a single
>> newline character (or CRLF pair) is a line break whereas a doubled
>> newline character denotes a paragraph break.
>>
>
> In that case CR or LF characters alone are not "paragraph separators" by
> themselves unless they are grouped together. Like NEL, they should just be
> considered as line separators and the terminology used in UAX 29 rule SB4
> is effectively incorrect if what matters here is just the linebreak
> property. And also in that case, the SB4 rule should effecticely include
> NEL (from the C1 subset).
>
> But as SB4 is only related to sentence breaking, It would be e problem
> because simple linebreaks are used extremely frequently in the middle of
> sentences.
>
> What the Sentence break algorithm should say is that there should first be
> a preprossing step separating line breaks and paragraph breaks (creating
> custom entities,(similar to collation elements, but encoded internally with
> a code point out of the standard space), that the rule SB4 would use
> instead of "Sep | CR | LF". That custome entity should be "Sep" but without
> the rule defining it, as there are various ways to represent paragraph
> breaks.
>
>
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Fri Feb 20 2015 - 20:52:57 CST
This archive was generated by hypermail 2.2.0 : Fri Feb 20 2015 - 20:52:58 CST