Re: Unclear text in the UBA (UAX#9) of Unicode 6.3

From: Ilya Zakharevich <nospam-abuse_at_ilyaz.org>
Date: Fri, 25 Apr 2014 01:11:26 -0700

On Wed, Apr 23, 2014 at 06:15:44PM -0700, Asmus Freytag wrote:
> On 4/23/2014 4:41 PM, Ilya Zakharevich wrote:
> >>> GREED) Given any close-delimiter marked as “non-matching”, its
> >>> pre-context does not contain any open-delimiter which could
> >>> match it.
> >>>
> >>> Here pre-context of a position is a concatenation of substrings of the
> >>> initial string:
> >>> • Take the most deeply nested “matched pair” containing the position
> >>> (if none, the whole string);
> >>> • take the part of the string inside this pair AND before the position;
> >>> • remove all “matched” pairs completely contained insidde this substring
> >>> together with what they enclose.

> >>Can you explain why, if you make "pre-context" simply the part of the
> >>whole string that precedes the unmatched close-delimiter, the words
> >>"which could match it" are insufficient?
> >Aha, this means that my description is INCOMPLETE: you got a wrong
> >impression what “match” means! Everywhere, this word means exactly
> >the same as in the MATCH rule: that Unicode codepoints match following
> >Unicode properties.

> >This is non-recursive definition. All rules are independent.

> That explains why you repeat most of the other constraints in your
> pre-context.

Frankly speaking, I do not see any such repetition.

> For a static definition, would it have been simpler to break the
> definition into
> two - say a "tentative parsing" (all conditions but greed) and
> "selected parsing",
> which the could be defined as the parsing that starts closest to the left.

I do not see how: to know whether a closing delimiter may be matched
or not, it is not enough to know “tentative” parsing of what preceeds
it; one must know the **actual** parsing. Eventually, you would end
with either a recursive definition, or a definition of a “process” of
parsing.

Anyway, I’ve written my portion of definitions which combine
“tentative” stuff with “best choice” of tentative variants. One ends
with monsters like
  http://perldoc.perl.org/perlre.html#Combining-RE-Pieces
(and, Eli, the fact that I wrote it does not imply that I must like it :-[ ).

In the case of Perl RExes, there is no alternative. IMO, if there IS
a way to define what a “standalone” GOOD THING is, it is __much__
better than the “best of many” way. Definiting it as “the best of
potentially good things” requires the reader to imagine first ALL the
potentially good things; only when this (otherwise not very useful)
universe has settled down in the reader’s mind they would be able to
pick up the best guy…

Ilya
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Fri Apr 25 2014 - 03:12:44 CDT

This archive was generated by hypermail 2.2.0 : Fri Apr 25 2014 - 03:12:45 CDT