Unclear text in the UBA (UAX#9) of Unicode 6.3
eliz at gnu.org
Sun Apr 20 05:24:22 CDT 2014
Would someone please help understand the following subtleties and
obscure language in the UBA document found at
http://www.unicode.org/reports/tr9/? Thanks in advance.
1. In paragraph 3.1.2, near its very end, we have this sentence (with
As rule X10 will specify, an isolating run sequence is the unit to
which the rules following it are applied, and the last character of
one level run in the sequence is considered to be immediately
followed by the first character of the next level run in the
sequence during this phase of the algorithm.
What does it mean here by "the rules following it"? Following what?
2. In BD16 (paragraph 3.1.3), the 1st bullet says:
. Create a stack for elements each consisting of a bracket character
and a text position. Initialize it to empty.
But then 1st sub-bullet of the 3rd bullet says:
. If an opening paired bracket is found, push its
Bidi_Paired_Bracket property value and its text position onto
But the stack does not hold values of Bidi_Paired_Bracket property, it
holds characters. Items 2 and 3 below that say:
2. Compare the closing paired bracket being inspected or its
canonical equivalent to the bracket in the current stack
3. If the values match, meaning the two characters
form a bracket pair, then [...]
So I guess the 1st bullet is correct, but the 3rd bullet should say
"... push the opening paired bracket character and its text position
onto the stack". Is this the correct interpretation?
3. Paragraph 3.3.2 says, under "Non-formatting characters":
X6. For all types besides B, BN, RLE, LRE, RLO, LRO, PDF, RLI, LRI,
FSI, and PDI:
. Set the current character’s embedding level to the embedding
level of the last entry on the directional status stack.
Note that the current embedding level is not changed by this rule.
What does this last sentence mean by "the current embedding level"?
The first bullet of X6 mandates that "the current character’s
embedding level" _is_ changed by this rule, so what other "current
embedding level" is alluded to here?
4. Rule X10 says in its last bullet:
Apply rules W1–W7, N0–N2, and I1–I2, in the order in which they
appear below, to each of the isolating run sequences, applying one
rule to all the characters in the sequence in the order in which
they occur in the sequence before applying another rule to any part
of the sequence. The order that one isolating run sequence is
treated relative to another does not matter.
Does the last sentence mean that it is OK to apply W1 to the 1st
isolating sequence, then apply W1 to the second isolating sequence,
then apply W2 to the 1st isolating sequence, followed by W2
application to the 2nd isolating sequence, etc.? IOW, the last
sentence refers to the order of processing between the isolating run
sequences, but says nothing about the order of applying rules between
5. Rule N0 says:
. For each bracket-pair element in the list of pairs of text positions
a. Inspect the bidirectional types of the characters enclosed
within the bracket pair.
b. If any strong type (either L or R) matching the embedding
direction is found, set the type for both brackets in the pair
to match the embedding direction.
First, what is meant here by "strong type [...] matching the embedding
direction"? Does the "match" here consider only the odd/even value of
the current embedding level vs R/L type, in the sense that odd levels
"match" R and even levels "match" L? Or does this mean some other
kind of matching? Table 3, which the only place that seems to refer
to the issue, is not entirely clear, either:
e The text ordering type (L or R) that matches the embedding level
direction (even or odd).
Again, the sense of the "match" here is not clear.
Next, what is meant here by "the characters enclosed within the
bracket pair"? If the bracket pair encloses another bracket pair,
which is inner to it, do the characters inside the inner pair count
for the purposes of resolving the level of the outer pair?
Lastly, I presume that by "the bidirectional types of the enclosed
characters" the text means the resolved types as modified by the
preceding phases, not the original types. Is that correct?
Again, thanks in advance for any help.
More information about the Unicode