Re: about P1 part of BIDI alogrithm

From: Eli Zaretskii <eliz_at_gnu.org>
Date: Tue, 11 Oct 2011 03:43:57 -0400

> Date: Tue, 11 Oct 2011 10:53:39 +0900
> From: "Martin J. Dürst" <duerst_at_it.aoyama.ac.jp>
> CC: li bo <libo.imc_at_gmail.com>, unicode_at_unicode.org
>
> I might add here that 'break a line' in the Bidi algorithm is done
> before actual reordering (which is done line-by-line), but after
> calculating all the levels.

Please be aware that this separation of the UBA into phases makes no
sense at all in the context of Emacs display engine. The UBA is
written from the POV of batch processing of a block of text -- you
pass in a string in logical order, and receive a reordered string in
return. The UBA describes the processing as a series of phases, each
one of which is completed for all the characters in the block of text
before the next phase begins.

By contrast, the Emacs display engine examines the text to display one
character at a time. For each character, it loads the necessary
display and typeface information, and then decides whether it will fit
the display line. Then it examines the next character, and so on. It
should be clear that processing characters one by one completely
disrupts the subdivision of the UBA into the phases that include
examination of more than that single character, let alone decisions of
where to break the line, because reordering can no longer be done
"line by line".

Let me give you just one example: if the character should be mirrored,
you cannot decide whether it fits the display line until _after_ you
know what its mirrored glyph looks like. But mirroring is only
resolved at a very late stage of reordering, so if you want to reorder
_after_ breaking into display lines, you will have to back up and
reconsider that decision after reordering, which will slow you down.

Given these considerations, it is a small wonder that the UBA
implementation inside Emacs is _very_ different from the description
in UAX#9. Therefore, the subdivision into phases that are on the line
and higher levels makes very little sense here, since the
implementation needed to produce an identical result while performing
a significant surgery on the algorithm description. In effect, the
UBA implementation in Emacs treated UAX#9 as a set of requirements,
not as a high-level description of the implementation.
Received on Tue Oct 11 2011 - 02:46:30 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 11 2011 - 02:46:31 CDT