Re: about P1 part of BIDI alogrithm

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Tue, 11 Oct 2011 19:43:53 +0200

> From: Philippe Verdy <verdy_p_at_wanadoo.fr>
> Date: Tue, 11 Oct 2011 17:44:12 +0200
> Cc: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>,
> libo.imc_at_gmail.com, unicode_at_unicode.org
>
> Then your implementation is completely couterintutive: imagine what
> will happen if you press enter in a RTL context a line that is
> continued below, but all lines parts of a LTR paragraph or document:
> all RTL characters in the line after this position (including possily
> all the continuation lines below) would suddenly move upward before
> the position where you pressed Enter to insert a forced linebreak, and
> all RTL characters before that position (including on previous
> continued lines) would swap down...

That's true. Which is why I agree that this is a limitation of the
current implementation. However, I explained in a previous mail why
the situation you describe is expected to be rare. And even with this
limitation, it is a significant improvement for users of bidirectional
scripts, because the previous versions had zero support for them.

The plan is to fix this irregularity in future versions. Emacs 24.1
is already in its pretest, which means significant changes are not
allowed.

> You need to make a distinction between breaks introduced only visually
> by automatic linewraps before continuation lines, and true
> line/paragraph separators that are encoded in the edited text. They
> behave very differently.

The distinction is there. The problem is not the distinction, but the
details of the implementation of breaking a long line. More
accurately, the details of what the display engine does when it
realizes that the next character does not fit on the display line.
This part of the implementation needs to be changed, and changed
significantly.

> If you cannot do that, then disable all linewraps and accept an edit
> mode where long paragraphs wil scroll horizontally.

Linewrap is user option in Emacs, and moreover, it has 3 or 4
different flavors. It would be unthinkable to disable them for the
benefit of what I think is a marginal use case, even if we consider
only users of bidi scripts, let alone all the rest.

> This means that you need to break lines in two separate steps: first
> between paragraphs or forced line breaks, then a second after
> determining the the direction of all characters and then remplacing
> their possible mirroring to infer the position of inserted linewraps
> (i.e. the virtual position where there could be a LINE SEPARATOR
> producing the same effect; most often where there are whitespaces or
> SHY). Reordering will still occur only after this step, and remains
> limited to a line not to a full paragraph.

This would mean to run the reordering twice, since you don't know
where to wrap a line until you lay out all of its grapheme clusters,
and you cannot do the layout until you reorder the characters. That's
because the dimensions of each character are generally different, and
so without knowing which character will be the 1st on the line, which
the second, etc. _in_the_visual_order_, you cannot compute their
summary width, and without that you cannot decide where to break the
line.

I have already designed an algorithm that will yield a correct layout
in these cases, an algorithm that does not need this two-step
reordering. It does them both on the fly. It's just that it's too
late to make such a significant and potentially destabilizing changes
in the next version of Emacs. It will have to wait for a future one.
Received on Tue Oct 11 2011 - 12:50:24 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 11 2011 - 12:50:24 CDT