Ilya noted:
> [Below, I completely ignore BIDI part of the specification, and
> concentrate ONLY on the parens match. I do not understand why this
> question is interlaced with BIDI determination; I trust that it is.]
Actually, it is, because the bracket-matching is really only interesting
in the cases where the boundaries of the isolating runs are in
question, and there are some directional differences in the runs.
The whole point of introducing the paired bracket complication was
to deal with edge cases for that, but...
> So one may ask: what will be the result of the CURRENT UNICODE parsing
> applied
> to Phillipe’s example?
>
> This is an [«] example [»] for demonstration only.
That is easily answered. Let's crank up the bidi reference code with
a shorter example that contains the relevant units: a [«] b [»] c
Turn up the trace output to see what rule N0 is actually doing,
and you get the following. (Set your display wide enough to not wrap the output
lines, for best interpretation.)
Trace: Entering br_UBA_ResolvePairedBrackets [N0]
Trace: br_PushBracketStack, bracket=005D, pos=2
Trace: br_PeekBracketStack, stack=00614808, top=00614810, tsptr=00614810
Trace: br_PeekBracketStack, bracket=005D, pos=2
Appended pair: opening pos 2, closing pos 4
Trace: br_PopBracketStack, #elements=1
Matched bracket
Trace: br_PushBracketStack, bracket=005D, pos=8
Trace: br_PeekBracketStack, stack=00614808, top=00614810, tsptr=00614810
Trace: br_PeekBracketStack, bracket=005D, pos=8
Appended pair: opening pos 8, closing pos 10
Trace: br_PopBracketStack, #elements=1
Matched bracket
Trace: Entering br_SortPairList
Pair list: {2,4} {8,10}
Append at end
Trace: Exiting br_SortPairList
Pair list: {2,4} {8,10}
Debug: No strong direction between brackets
Debug: No strong direction between brackets
Current State: 14
Text: 0061 0020 005B 00AB 005D 0020 0062 0020 005B 00BB 005D 0020 0063
Bidi_Class: L WS ON ON ON WS L WS ON ON ON WS L
Levels: 0 0 0 0 0 0 0 0 0 0 0 0 0
Runs: <L------------------------------------------------------------L>
Because of the way the stack processing is defined, the first bracket pair is [«]
and the second bracket pair is [»]. The algorithm does not push down potential
matches while seeking for a largest outer pair to match. One could – particularly
if one is mathematically inclined – argue that that is not the right way to do the
matching, but it *is* the way the algorithm is currently defined. And it is the
way both of the bidi reference implementations, all of the BidiCharacterTest.txt
data, the ICU implementation, the Microsoft implementation, and the Harfbuzz
implementation are defined, to the best of my knowledge. Other implementations
would have to be doing the same, or they would be failing the conformance tests
in BidiCharacterTest.txt.
Note that for an all left-to-right run of text like this, with no isolating runs and
no embeddings, the implications of rule N0 are trivial and non-interesting. The
bracket matches don’t end up *doing* anything relevant to the text reordering
for bidi in this example. But once you start mixing directions of text and adding embeddings
and isolating runs, then things get complicated in non-trivial ways for the output.
--Ken
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Mon Apr 21 2014 - 19:45:22 CDT
This archive was generated by hypermail 2.2.0 : Mon Apr 21 2014 - 19:45:22 CDT