2003-01-22, MED
The key issue for the Unicode BIDI committee before the next UTC meeting is to discuss and come to consensus on item #5: whether (logically) shaping gets applied before or after BIDI directional reordering. In most cases, this doesn't matter, but it can affect the result. The following describes the possible differences in appearance, and outlines options for the committee to decide among.
We will first set up a simple test case. Suppose that we have the following string of Arabic characters in memory, as characters 1, 2, 3, and 4.
1 2 3 4 ج
062C
JEEMع
0639
AINل
0644
LAMم
0645
MEEML L R R
We will override the first two characters to be LTR. So that we can show both paragraph directions, the next two will be embedded, but with the normal RTL direction. One can use embedding codes to get this effect in plain text, or markup in HTML.
This is reproduced below, although the effect in the last three rows will depend on the browser's BIDI support of these characters and/or HTML styles.
Codes | Left-Right Paragraph | Right-Left Paragraph |
---|---|---|
LRM/RLM LRO JEEM AIN PDF RLO LAM MEEM PDF |
جعلم | جعلم |
<p dir="ltr"/"rtl"> LRO JEEM AIN PDF RLO LAM MEEM PDF |
جعلم | جعلم |
<p dir="ltr"/"rtl"> <bdo dir="ltr"> JEEM AIN </bdo> <bdo dir="rtl"> LAM MEEM </bdo> |
جعلم | جعلم |
The resulting display order will be one of the following, depending on the paragraph direction.
Left-Right Paragraph | Right-Left Paragraph | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
There are a number of possible shaping results, depending on what happens within runs and what happens across runs. The four most likely candidates are:
A. If we shape, then apply BIDI, we get the following visual result:
Left-Right Paragraph | Right-Left Paragraph | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
B. If we shape simply according to the resulting display order (after BIDI), we get the following:
Left-Right Paragraph | Right-Left Paragraph | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
C. If we shape simply according to the resulting display order (after BIDI), but don't shape across direction-run boundaries, we get the following:
Left-Right Paragraph | Right-Left Paragraph | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
D. If we simply don't shape characters with overridden direction, we get the following:
Left-Right Paragraph | Right-Left Paragraph | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
I think the argument for the (A) is that in practice it will be quite unusual to override the direction of Arabic letters, and it may not matter than the forms look odd. And (A) may be simpler to implement, since line breaks can be decided before applying the BIDI algorithm.
For (B) or (C), one could argue that the end result is less weird, and that in practice the BIDI algorithm must be applied anyway to the entire paragraph; so at that point one knows what the ordering is anyway. (C) may be simpler to implement, since one never needs to look outside of directional boundaries for shaping. (D) probably is no simpler to implement, since you still have to determine the runs before you decide whether or not to shape.
Note: it appears that both IE and NN use (C). Please try out other products to see what they do.
We could also, of course, have an approach Z:
Z. The results of shaping directionally-overridden characters are undefined, and could be any of the above.
The BIDI committee should discuss the ramifications of these approaches, hopefully developing a consensus before the next UTC meeting.