L2/06-010
Editorial suggestions for UAX #9: 5.0.0
Asmus Freytag
I find section 3.4 in UAX#9 a bit tough to follow. The problem is that the text
starts off with a number of digressions before getting to the actual
'reordering rules' themselves.
** Here's the current text, with digressions marked
in green and with [] (and my comments in red and with **):
The following algorithm describes the logical process of finding the
correct display order. [As described before, this
logical process is not necessarily the actual implementation, which may
diverge for efficiency as long as it produces the same results]. As
opposed to resolution phases, this algorithm acts on a per-line basis,
and is applied after any line wrapping is applied to the paragraph.
[The process of breaking a paragraph into one or
more lines that fit within particular bounds is outside the scope of the
bidirectional algorithm. Where character shaping is involved, it can be
somewhat more complicated (see Section 8.2 Arabic of [Unicode]).]
Logically there are the following steps:
- The levels of the text are determined according to the bidirectional
algorithm.
- The characters are shaped into glyphs according to their context
(taking the embedding levels into account for mirroring!).
- The accumulated widths of those glyphs (in logical order) are
used to determine line breaks.
- [Note that the
soft-hyphen (SHY) works as it does in other scripts.
(** the context for "other scripts" is buried above in
a reference!) That is, it indicates a point where the line
could be broken in the middle of a word. If the rendering system
breaks at that point, the display —
including shaping — should
be what is appropriate for the given language. For more information
on this and other line-breaking issues, see [UAX14].]
- For each line, rules L1-L4 are used to reorder the characters on
that line.
- The glyphs corresponding to the characters on the line are displayed
in that order.
L1. On each line, reset.....
** Here are suggested tweaks:
The following rules describe the logical
process of finding the correct display order. As opposed to resolution
phases, this algorithm acts on a per-line basis, and is applied after
any line wrapping is applied to the paragraph.
Logically there are the following steps:
- The levels of the text are determined according to the bidirectional
algorithm.
- The characters are shaped into glyphs according to their context
(taking the embedding levels into account for mirroring!).
(See also Section 3.5 Shaping).
- The accumulated widths of those glyphs (in logical order) are
used to determine line breaks (see [UAX#14])
- For each line, rules L1-L4 are used to reorder the characters on
that line.
- The glyphs corresponding to the characters on the line are displayed
in that order.
L1. On each line, reset.....
** I suggest to talk about rules, not algorithm
here - everything happens
under the umbrella of the Bidi Algorithm, so it's confusing to have
reordering
be another algorithm. That makes it unnecessary to reassert the
qualification
about implementation vs. logical algorithm.
** However, it might be worth considering putting such a statement into section 3.1
right
before the paragraph "Combining characters". This is the place where the text
talks about the algorithm as a whole, and noting there that implementation is
different from logical specification makes a lot more sense than buried in
section 3.4.
** In Section 3.5, it makes sense to talk about Arabic (and related).
Currently the text starts off as if the reader is familiar with the concept:
Shaping is logically applied after the bidirectional algorithm is
used, ....
** Instead, I suggest:
Cursively connected scripts, such as Arabic or
Syriac, require the selection of positional character shapes that depend on
adjacent characters (see
Section 8.2 Arabic of [Unicode]).
Shaping is logically applied after the bidirectional algorithm is
used,...
** To capture the fine points about shaping and
line breaking, why not introduce a numbered (or unnumbered) sub-section at
the end of 3.5? That's a good place to give the issue the visibility it
needs, without distracting from the main explanation. And by that time, the
reader has considered the generic issues with shaping. Here's a suggested
draft:
3.5.1 Shaping and line
breaking
The process of breaking a paragraph into one or more lines that fit
within particular bounds is outside the scope of the bidirectional
algorithm. Where character shaping is involved, the width calculations must
be based on the shaped glyphs.
Note that the soft-hyphen (SHY) works in
cursively connected scripts as it does in other scripts. That is, it
indicates a point where the line could be broken in the middle of a word. If
the rendering system breaks at that point, the display
— including shaping
— should be what is appropriate for
the given language. For more information on this and other line-breaking
issues, see [UAX14].]