A real bug in bidi

From: Roozbeh Pournader (roozbeh@sharif.edu)
Date: Fri Jan 05 2001 - 09:16:03 EST

Next message: F. Avery Bishop: "[Sort of OT] 15 minutes of fame for James Do"
Previous message: Lars Marius Garshol: "Re: GBK, HZ and EUC-TW"
Next in thread: Mark Davis: "Re: A real bug in bidi"
Maybe reply: Mark Davis: "Re: A real bug in bidi"
Maybe reply: Mark Davis: "Re: A real bug in bidi"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dear Unicoders,

This time I think we have found a real bug in the Bidirectional Algorithm.
The problem is that the algorithm seems to be contradictory with itself. We
were trying to use the "Implementation Notes" at the end of UTR#9 to
retain the format codes. But that doesn't produce the same results as when
removing them in rule X9. We really appreciate any comments.

Would you please take your pencils out? ;)

Our example is probably not the simplest case, but is small enough:

        U+202B U+05D1 U+202C U+0031 U+202D U+0061 U+202C
        <RLE> BET <PDF> 1 <LRO> a <PDF>

When we run the algorithm with the notes in "Retaining Format Codes", we
get the following levels:

        <RLE> BET <PDF> 1 <LRO> a <PDF>
          1 3 3 2 1 2 1

which according to L2 becomes:

        <PDF> a <LRO> <PDF> BET 1 <RLE>

when rendered visually. That's "a BET 1". But when the format codes are
removed in X9, the levels will be:

        BET 1 a
         3 2 2

which becomes "BET 1 a" when rendered. So the order is different, you see.

(I do not claim anything about the user expectation in the example, because
both are against my expectation. I expected "a 1 BET". I also appreciate
comments on your expectations.)

We may have made a mistake, I know, but we have checked that many times.
I'm giving the medial results I obtained from running the algorithm while
retaining format codes here:

Original character types: "RLE R PDF EN LRO L PDF"

      P1-P3: paragraph embedding level becomes 1.
      X1-X8: levels become "? 3 ? 1 ? 2 ?".
modified X9: types become "BN R BN EN BN L BN",
             levels become "1 3 3 1 1 2 2".
        X10: four runs, (sor, eor) are (R, R), (R, R), (R, L), (L, L).
      W1-W5: no change.
modified W6: types become "ON R ON EN ON L ON".
         W7: no change.
         N1: types become "R R R EN ON L L"
         N2: types become "R R R EN R L L"
      I1-I2: levels become "1 3 3 2 1 2 2".
modified L1: levels become "1 3 3 2 1 2 1".
         L2: the ordering becomes "<PDF> a <LRO> <PDF> BET 1 <RLE>".

--roozbeh

Next message: F. Avery Bishop: "[Sort of OT] 15 minutes of fame for James Do"
Previous message: Lars Marius Garshol: "Re: GBK, HZ and EUC-TW"
Next in thread: Mark Davis: "Re: A real bug in bidi"
Maybe reply: Mark Davis: "Re: A real bug in bidi"
Maybe reply: Mark Davis: "Re: A real bug in bidi"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT