Unclear text in the UBA (UAX#9) of Unicode 6.3
asmusf at ix.netcom.com
Tue Apr 22 11:06:27 CDT 2014
On 4/22/2014 2:19 AM, Ilya Zakharevich wrote:
> I think the crucial problem is with
> 1( 2[ 3( 4] 5) 5b] 6)
> I have two possible interpretations: one matches 2 with 5b, another
> leaves 2 unmatched.
if you read UAX#9, the way the algorithm works is by pushing openers on
a stack, then, on finding the first closer, going down the stack and
attempting to locate a match, then, on finding a match, discarding any
enclosed openers, on not finding a match, discarding the closer.
(discard = ignore for further matching, don't treat as bracket any longer).
So, when we reach 4] we have
on the stack. The match is with 2[ and 3 is ignored. 1( remains and can
be matched later to 5).
Ultimately 5b] and 6) are ignored.
I believe that your scheme does not match the PBA in that it assumes
that brackets are hierarchical and attempts to preserve the best
hierarchy, whereas PBA assumes that pairs that are closer together are
more likely to be correct matches (for non-mathematical texts
hierarchies are not the norm (and they are shallow at best)).
What the PBA actually does can now be put into a definition plus a rule,
neither of which use "stack" or other implementation details, such as
"variables" or "lists".
D A bracket pair is a pair of an opening paired bracket and a closing
paired bracket characters within the same isolating run sequence,
such that the Bidi_Paired_Bracket property value of the former
character or its canonical equivalent equals the latter character or
its canonical equivalent.
R Characters are resolved into resolved bracket pairs as follows:
Starting at the beginning of the text, when the a closing bracket
is encountered, find the nearest preceding opening character that is
of a resolved pair, and not ignored for pair resolution and that can
bracket pair. If one exists, resolve the pair, and mark any enclosed
brackets of any kind as ignored. Otherwise, if no pair can be
the closing bracket as ignored.
What this shows is that what the text in BD16 of UAX#9 tries to cover is
both a definition
and a rule; which makes it so difficult to follow.
I think what should be proposed is such a breakdown into a smaller
speaks to the matching of properties (modulo canonical equivalence) separate
from the strategy for resolving actual pairs, which is better stated as
The rule does not need to use implementation language to be definite.
A "resolved" bracket pair is simply the actual pair resolved by rule "R"
rest of the PBA acts on "resolved" pairs.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode