Re: about P1 part of BIDI alogrithm from Eli Zaretskii on 2011-10-11 (Unicode Mail List Archive)

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Tue, 11 Oct 2011 03:27:53 -0400

> Date: Tue, 11 Oct 2011 10:29:51 +0900
> From: "Martin J. Dürst" <duerst_at_it.aoyama.ac.jp>
> CC: li bo <libo.imc_at_gmail.com>, unicode_at_unicode.org
>
> >> From section 3:
> >
> > Paragraphs are divided by the Paragraph Separator or appropriate
> > Newline Function (for guidelines on the handling of CR, LF, and CRLF,
> > see Section 4.4, Directionality, and Section 5.8, Newline Guidelines
> > of [Unicode]). Paragraphs may also be determined by higher-level
> > protocols: for example, the text in two different cells of a table
> > will be in different paragraphs.
> >
> >
> >> I think only 'Enter' and '*Paragraph separator*' can do paragraph breaking.
> >
> > In addition to the Paragraph Separator, _any_ newline function (LF,
> > CR+LF, CR, or NEL) can end a paragraph. Also U+2028, the LS
> > character. See section 5.8 of the Unicode Standard cited above.
>
> No, U+2028 (LS) is explicitly *not* a Paragraph Separator.

I stand corrected. I see now that its bidi type is WS, not B. I
guess now I know why I didn't handle it as a Newline in the Emacs
implementation of the UBA; I always wondered why I didn't ;-)

Anyway, I think UAX#9 should make it more explicit which of the
Newline Function characters divide paragraphs. By referencing section
5.8 of the Unicode standard it makes this issue confusing, because LS
is described there as one of the Newline Functions.
Received on Tue Oct 11 2011 - 02:31:04 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 11 2011 - 02:31:05 CDT