Re: about P1 part of BIDI alogrithm from Martin J. Dürst on 2011-10-11 (Unicode Mail List Archive)

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Tue, 11 Oct 2011 18:54:10 +0900

Hello Eli,

There is absolutely no problem to treat the algorithm in UAX#9 as a set
of requirements, and come up with a totally different implementation
that produces the same results. I think actually UAX#9 says so somewhere.

But what is, strictly speaking, not allowed is to change the
requirements. One requirement of the algorithm is that when lines are
broken, logically earlier characters stay on earlier lines, and
logically later characters move to later lines.

In this respect, your implementation doesn't conform to UAX#9. There's
an external reason for this, and an internal one. The external reason is
that continuation lines in Emacs are in general just an overflow device,
text in Emacs isn't supposed to be broken into lines in the same way as
e.g. word processors break lines to form paragraphs. I'm not sure how
much it is true (line breaks often e.g. interfere with formatting in
Japanese and other languages that don't use spaces between words and
don't work well with a convention of "convert a line break in the source
to a space in the output"), but I think to some extent it is true.

The internal reason is the one you describe below. It may indeed be a
strong reason from an implementation perspective, but from an user
perspective, it's a very weak reason. Also, I don't understand it fully.
You say that the Emacs display engine examines each character in turn.
Assuming these are in logical order, you would just examine them up to
the point where you have "about one line" of glyphs. There would indeed
be a bit of back and forth there because of the interaction between bidi
algorithm and glyph selection (but as far as I know, mirrored glyphs
mostly have the same width as their originals). Anyway, that bit of back
and forth seems to be much less of a problem than the back and forth
that you get when you have to reorder over much larger distances because
you're essentially considering a whole paragraph as a single line. But
I'm not an expert in Emacs display engine details, so I can't say for sure.

Regards, Martin.

On 2011/10/11 16:43, Eli Zaretskii wrote:
>> Date: Tue, 11 Oct 2011 10:53:39 +0900
>> From: "Martin J. Dürst"<duerst_at_it.aoyama.ac.jp>
>> CC: li bo<libo.imc_at_gmail.com>, unicode_at_unicode.org
>>
>> I might add here that 'break a line' in the Bidi algorithm is done
>> before actual reordering (which is done line-by-line), but after
>> calculating all the levels.
>
> Please be aware that this separation of the UBA into phases makes no
> sense at all in the context of Emacs display engine. The UBA is
> written from the POV of batch processing of a block of text -- you
> pass in a string in logical order, and receive a reordered string in
> return. The UBA describes the processing as a series of phases, each
> one of which is completed for all the characters in the block of text
> before the next phase begins.
>
> By contrast, the Emacs display engine examines the text to display one
> character at a time. For each character, it loads the necessary
> display and typeface information, and then decides whether it will fit
> the display line. Then it examines the next character, and so on. It
> should be clear that processing characters one by one completely
> disrupts the subdivision of the UBA into the phases that include
> examination of more than that single character, let alone decisions of
> where to break the line, because reordering can no longer be done
> "line by line".
>
> Let me give you just one example: if the character should be mirrored,
> you cannot decide whether it fits the display line until _after_ you
> know what its mirrored glyph looks like. But mirroring is only
> resolved at a very late stage of reordering, so if you want to reorder
> _after_ breaking into display lines, you will have to back up and
> reconsider that decision after reordering, which will slow you down.
>
> Given these considerations, it is a small wonder that the UBA
> implementation inside Emacs is _very_ different from the description
> in UAX#9. Therefore, the subdivision into phases that are on the line
> and higher levels makes very little sense here, since the
> implementation needed to produce an identical result while performing
> a significant surgery on the algorithm description. In effect, the
> UBA implementation in Emacs treated UAX#9 as a set of requirements,
> not as a high-level description of the implementation.
>
>
Received on Tue Oct 11 2011 - 04:58:27 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 11 2011 - 04:58:28 CDT