Re: Unicode plain text

From: Pierre Lewis (lew@nortel.ca)
Date: Tue May 27 1997 - 14:39:00 EDT


A couple of days back, Otto wrote:

> the English phrases "the United States of America", and "the United
> Kingdom" are imbedded in a run of Arabic text (here represented by
> "<<<<<<". (Note that the paragraphs are right-justified, as Arabic
> is right-to-left). Just try to scan the following example with your
> eyes, following the direction indicated by the "<" marks when viewed
> as arrow-points. You would certainly prefer
> the United <<<< << <<<<< << <<<<< <<<<< <<< <<<<<<
> ------------------- page-break -------------------
> <<<<<< the United Kingdom <<< States of America
> .<<<< <<<< <<<<<< <<<<
> over
> of America <<<< << <<<<< << <<<<< <<<<< <<< <<<<<<
> ------------------- page-break -------------------
> <<<<<< the United Kingdom <<< the United States
> .<<<< <<<< <<<<<< <<<<
>
> Hence, I think, the bidi algorithm should treat every line break,
> and, a fortiori, every page break, as the start of a new block of
> text (bidi-wise, not logically!).

I didn't see any answer, so I'll risk one. This scenario isn't a
problem. The BIDI algorith is a two-step process:

1) Resolve the embedding levels. This is done on a block of text
   which can span lines, pages, whatever. Current block terminators
   are LS (probably shouldn't be one) and PS.

2) Reorder the text on a LINE-BY-LINE basis using the resolved
   embedding levels. (my emphasis)

So the second possibility above cannot happen even if the
whole paragraph is a single block in the BIDI sense.

Pierre



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT