Unclear text in the UBA (UAX#9) of Unicode 6.3
eliz at gnu.org
Mon Apr 21 02:55:59 CDT 2014
> Date: Sun, 20 Apr 2014 12:58:23 -0700
> From: Asmus Freytag <asmusf at ix.netcom.com>
> On 4/20/2014 3:24 AM, Eli Zaretskii wrote:
> > Would someone please help understand the following subtleties and
> > obscure language in the UBA document found at
> > http://www.unicode.org/reports/tr9/? Thanks in advance.
> I've tried to give you some explanations
> in some places, I concur with you that the wording could be improved
> and that such improved wording should be proposed to the UTC (or its
> editorial committee) for incorporation into a future update.
How do we do that?
> For details, see below.
> > 1. In paragraph 3.1.2, near its very end, we have this sentence (with
> > my emphasis):
> > As rule X10 will specify, an isolating run sequence is the unit to
> > which the rules following it are applied, and the last character of
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > one level run in the sequence is considered to be immediately
> > followed by the first character of the next level run in the
> > sequence during this phase of the algorithm.
> > What does it mean here by "the rules following it"? Following what?
> That looks like a bad referent, but from context, this "it" must be X10
Ah, so simply saying "the following rules" or "rules following X10"
would be enough.
> Bullet 1 could be changed to
> . Create a stack for elements each consisting of a*code point* (Bidi_Paired_Bracket property value)
> and a text position. Initialize it to empty.
> to make things more clear. And a slight wording change might help the
> reader with item 2:
> 2. Compare the*code point for the*closing paired bracket being inspected or its
> canonical equivalent to the*code poin*t (Bidi_Paired_Bracket property value) in the current stack
> And, to continue
> 3. If the values match, meaning*the character being inspected and the character**
> ** at the text position in the stack* form a bracket pair, then [...]
Right, this makes the description a whole lot more clear.
> Apply rules W1–W7, N0–N2, and I1–I2 to each of the isolating run sequences.
> For each sequence, [completely] apply each rule in the order in which they appear below.
> The order that one isolating run sequence is treated relative to another does not matter.
> I believe the above restatement expresses the same thing in fewer words.
It does, thanks.
> > 5. Rule N0 says:
> > . For each bracket-pair element in the list of pairs of text positions
> > a. Inspect the bidirectional types of the characters enclosed
> > within the bracket pair.
> > b. If any strong type (either L or R) matching the embedding
> > direction is found, set the type for both brackets in the pair
> > to match the embedding direction.
> > First, what is meant here by "strong type [...] matching the embedding
> > direction"? Does the "match" here consider only the odd/even value of
> > the current embedding level vs R/L type, in the sense that odd levels
> > "match" R and even levels "match" L? Or does this mean some other
> > kind of matching? Table 3, which the only place that seems to refer
> > to the issue, is not entirely clear, either:
> > e The text ordering type (L or R) that matches the embedding level
> > direction (even or odd).
> > Again, the sense of the "match" here is not clear.
> even/odd --- R/L match, might be made more explicit
I agree this should be made more explicit, as this is a somewhat
subtle issue that might trip the reader.
> > Next, what is meant here by "the characters enclosed within the
> > bracket pair"? If the bracket pair encloses another bracket pair,
> > which is inner to it, do the characters inside the inner pair count
> > for the purposes of resolving the level of the outer pair?
> They do, so there's no need to change the text.
It might be a good idea to say that explicitly, e.g. as a note, or at
least provide another example where the strong characters are only
inside an inner bracket pair, which will send the same message to the
Thanks again for the clarifications.
More information about the Unicode