RE: 3 big bidi bugs

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Wed May 29 2002 - 15:10:48 EDT


I don't think anything to do with 5 levels of imbedding or overrides can
be considered a big bug.

Jony

> -----Original Message-----
> From: unicode-bounce@unicode.org
> [mailto:unicode-bounce@unicode.org] On Behalf Of Bernard Miller
> Sent: Wednesday, May 29, 2002 6:57 PM
> To: unicode@unicode.org
> Subject: 3 big bidi bugs
>
>
>
> This letter describes 3 major technical problems with the
> current Unicode bidirectional algorithm as described in UAX
> #9, version 3.20. Problems 1 and 3 have security
> implications. Other problems with the whole Unicode
> bidirectional encoding approach, and their solutions, are
> discussed in the recently updated Bytext FAQ and
> documentation (www.bytext.org).
>
> (1) Line width dependent mangling, general case:
> Step L2 of UAX #9 indicates that a line that resolves into a
> sequence of characters with homogenous embedding levels will
> ALWAYS be displayed right to left, regardless of what the
> embedding level is.
>
> So, for example a line that with the L1 resolved embedding
> levels of: 2222222222222222222222222 will display right to
> left 3333333333333333333333333 will display right to left
> 4444444444444444444444444 will display right to left etc
>
> Likewise:
> in 3333333333333333333333331, the 3’s will display left to
> right in 5555555555555555555555551, the 5’s will display left
> to right etc
>
> It directly contradicts the writers intentions. It means that
> different Unicode compliant applications will display the
> same characters in a different order (depending on available
> line width). Examples of how this is bad are given in
> question 12 of the Bytext FAQ (www.bytext.org/faq#12). This
> can be fixed by rewording step L2 such that a reversal
> happens from the highest embedding level to each lower
> contiguous embedding level, regardless if the embedding level
> is represented by a character on the line, until the
> embedding level of 1 is reached (or, as an optimization,
> until the first odd embedding level equal to or lower than
> the lowest embedding level represented by a character on the line).
>
> (2) Line width dependent mangling, spelling conventions for
> quotes: What is the purpose of step X10 if not to allow
> something like LEFT DOUBLE QUOTATION MARK to be used as if it
> was an OPEN DOUBLE QUOTATION MARK? One simply puts an
> embedding inside a quotation, such as “<RLE>quotation<PDF>”.
> The problem with this is that it only works if the quotation
> begins and ends on the same line. Examples of how the text is
> mangled when the quotation spans multiple lines are given in
> question 13 of the Bytext FAQ (www.bytext.org/faq#13). This
> cannot really be fixed with minor changes other than to
> notify users that the whole left=open, right=closed idea may
> not work as such when the default automatic line breaking is
> used. Users should not rely on any spelling conventions that
> do not bypass the effects of step X10 and mirroring --how
> this can be done is described in the Bytext documentation.
>
> (3) Mirroring ambiguities:
> What if eor = sor?
>
> text: R RLO whatever PDF N LRO whatever PDF
> embedding level at step X9: 1 3 3 1 2 2
> directional type at step X10: R R R ? L L
>
> The above example should be in a monospace font. The original
> is at www.bytext.org/faq#12. Step X10 is ambiguous whether
> the “N” should be L or R. This means that if N is has the
> mirrored property, some implementations might display the
> mirrored form, others the non mirrored form, and others might
> result in an error. This can be fixed by deciding on a single
> form for such cases. Also, the
> statement: “for two adjacent runs, the eor of the first run
> is the same as the sor of the second” needs to be removed
> because it is not true.
>
> Bernard
> ---
> Bernard Rafael Miller, email: bernard_r_miller@bytext.org
> Format enabling simplified 8 bit regexes of UCS characters:
> www.bytext.org
> ---
>
>
>
>



This archive was generated by hypermail 2.1.2 : Wed May 29 2002 - 13:30:18 EDT