RE: 3 big bidi bugs

From: Bernard Miller (
Date: Thu May 30 2002 - 11:36:06 EDT

Mark Davis wrote:
> [L2] is not the following:

I'm glad to hear that "bug" 1 is not how L2 is intended to work (this means
that the answer to FAQ question 12 "Is Bytext bidirectionality compatible
with Unicode bidirectionality?" is simply yes, instead of a qualified yes).
I don't wish to give the impression that I care too much about semantic
errors, but if you can't acknowledge that what was said in L2 was not what
was intended (instead of just being unclear) I'm going to have to call you
on that:

Let's say you have a line consisting of characters with all embedding level
4... How is "3" considered to be the lowest odd level on that line? It's no
more the lowest odd level than 5 or 1 is. At best, if you consider a
character with embedding level 4 to actually consist of 4 and each lower
embedding level (4, 3, 2, 1, and zero), which is not entirely unreasonable,
then 1 will always be the lowest odd embedding level on every line except a
line consisting of all zero's. But since L2 doesn't say " 1", it rules
out this interpretation.

A function implementing L2 might go thru the following steps on each line:
1. find the highest level
2. find the lowest odd level
For a line consisting of all 4's as above, step 1 will return 4 and step 2
should return null since there are no odd levels on the line. A list
consisting of "from 4 to null" can only reasonably be interpreted as
consisting only of 4. Going on with this you get the "bugs" I describe.

If you are familiar with each implementation of the algorithm, it might be
reassuring to users if you can state that none actually work in the manner
above. Any other implementations might want to test for this.

> I believe other people addressed the other two items you thought were
> bugs.

Other people have not addressed "bug" 2 accurately. Here's an impromptu
shorthand to summarize the issue:

RLE..."LRE...PDF" looks ok on 1 or more lines, unless a strong L character
precedes or follows the quotation, as in: RLE...L "LRE...PDF"

LRE..."RLE...PDF" looks ok on 1 or more lines, unless a strong R character
precedes or follows the quotation, as in: LRE...R "RLE...PDF"

LRE...RLE"..."PDF looks ok on 1 line, looks messed up on multiple lines

RLE...LRE"..."PDF looks ok on 1 line, looks messed up on multiple lines

This "bug" is weaker than I originally thought, but it still belongs in
question 13 of the Bytext FAQ "How is using bidirectionality in Bytext
easier than in Unicode?"... even Tim Partridge didn't get it right as to how
to spell embedded quotations ("Surely if the quotation is meant to be right
to left the RLE and PDF should
be outside the entire thing, including the quotes"). These kinds of issues
can be summarized as an overdependence on character properties, language
specific conventions, and formatting characters with overlapping
functionality that allow multiple spellings for the same formatting. In
other words, as others have said, the Unicode bidirectional algorithm is too
complex. The (new) Bytext encoding of bidirectionality shifts the complexity
to the level of transcoding and to input methods. It effectively eliminates
multiple encodings that achieve the same embedding levels, so like
everything else in Bytext it is more regular expression friendly. 


Bernard Rafael Miller, email:
Format enabling simplified 8 bit regexes of UCS characters:
“We believed that the cybernetic approach to consciousness, whipped up
frothy, would carry us to a plateau overlooking a pleasant mirror, but
instead left us blathering in the dressed up solitude of mannequin
planets.” --Steven Jesse Bernstein

This archive was generated by hypermail 2.1.2 : Thu May 30 2002 - 10:04:03 EDT