Re: about P1 part of BIDI alogrithm

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Mon, 10 Oct 2011 08:10:19 -0400

> Date: Mon, 10 Oct 2011 17:47:21 +0800
> From: li bo <libo.imc_at_gmail.com>
>
> *P1. Split the text into separate paragraphs. A paragraph separator is kept
> with the previous paragraph. Within each paragraph, apply all the other
> rules of this algorithm.*
>
> Here, what does paragraph mean? which symbols can *Split the text into
> separate paragraphs?

From section 3:

  Paragraphs are divided by the Paragraph Separator or appropriate
  Newline Function (for guidelines on the handling of CR, LF, and CRLF,
  see Section 4.4, Directionality, and Section 5.8, Newline Guidelines
  of [Unicode]). Paragraphs may also be determined by higher-level
  protocols: for example, the text in two different cells of a table
  will be in different paragraphs.

> I think only 'Enter' and '*Paragraph separator*' can do paragraph breaking.

In addition to the Paragraph Separator, _any_ newline function (LF,
CR+LF, CR, or NEL) can end a paragraph. Also U+2028, the LS
character. See section 5.8 of the Unicode Standard cited above.

IOW, from the UBA point of view, each line is a separate "paragraph".
Or at least this is my interpretation of the UBA ;-)

> what's the meaning of 'appropriate Newline Functions' and 'higher-level
> protocol paragraph determination'?

Newline Function (NLF) is described in Section 5.8 of Unicode.
Higher-level protocols are described in section 4.3 of UAX#9. In a
nutshell, your application can have its own ideas of what begins and
what ends a paragraph, and you are allowed to use those rules instead
of what P3 says.
Received on Mon Oct 10 2011 - 07:10:19 CDT

This archive was generated by hypermail 2.2.0 : Mon Oct 10 2011 - 07:13:34 CDT