From: Andy Heninger (andy.heninger@gmail.com)
Date: Fri Oct 05 2007 - 12:21:45 CDT
On 10/4/07, Mike <mike-list@pobox.com> wrote:
>
> > With strings in sets at all, separately from the question of how to do
> > set negation, I'm not sure how matching should work. Which choice is
> > selected if more than one is possible? Should backtracking try
> > additional choices if the first one doesn't lead to an overall match?
> > If sets don't have an implied ordering, do we need to require a POSIX
> > style longest match, which could be slow?
>
> In a set, I keep track of the strings by their length, so the longest
> match is always found. I don't think you want to backtrack and try a
> shorter string since the longer match is supposed to be treated as a
> unit....
>
> > Should the set [^xyz\q{ch}] match the 'c' in "ch" ?
>
> I don't think so; since the \q{ch} matches "ch", the negated set does
> not match at the first position.
The choices you have made seem reasonable to me.
But what would implementations with a DFA (non-backtracking) implementation
do? It would be very difficult for them to not take a shorter string from a
set if that led to an overall longer match. Would it be OK - still useful-
if the UTS left what happens unspecified?
-- Andy
> I'm half inclined to move strings, or literal clusters, into section 3,
> > then move the entire section 3 of UTS-18 into a separate document for
> > interesting, but not fully worked out, ideas.
>
> This seems like a good idea.
>
> Mike
>
This archive was generated by hypermail 2.1.5 : Fri Oct 05 2007 - 12:50:46 CDT