From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Oct 09 2007 - 23:30:54 CDT
Hans Aberg [mailto:haberg@math.su.se] wrote:
> On 10 Oct 2007, at 01:11, Philippe Verdy wrote:
> > Anyway, you're still not replying to the question: which is the
> > expected
> > ordering for the returned matches?
>
> If you know what language the grammar produces, then you know the
> language complement. And the DFA construct shows that if the grammar
> is RE (Chomsky hierarchy type 3), there exists an RE grammar that
> produces the complement language.
I know all that (in fact I had already said the same thing before: the
language depends on the alphabet that "." recognizes partly, with the
exception of newlines that may be outside of the subset of the alphabet
matched by ".").
But a DFA or NFA approach absolutely does not matter at all semantically. It
absolutely does not influence the language recognized, and so cannot
influence the negation of this same language. Don't mix the implementation
(NFA/DFA...) with the language that the implementation is supposed to
recognize.
> > I absolutely don't care if you use a DFA or not.
>
> So once you have decided what the abstract symbols you use in your RE-
> like algebra should mean grammar-wise (hopefully producing an RE
> grammar), or what DFA they should produce, you can use your favorite
> method to compute an RE grammar that produces the language complement.
Once again, the DFA argument does not matter at all. It absolutely does not
influence the language or its algebra. But you are still missing the point:
the order of matches DOES matter in all practical uses: whatever a regexp
will match, according to its algebra, this will be according to it only a
unordered set, that the client application will process in an ordered
fashion; just consider the two typical cases:
- of a text editor performs a regexp search to move the cursor position or
the current selection to the next or previous match: it can't use all the
possible matches at the same time. This means that it will take a decision
about the first match that satisfies the regexp. No DFA involved here.
- of a automated search-&-replaceoperation: it is critical for the order to
be consistent, because it will not replace all matches at the same time
(don't forget that within the set of matches, there may be a lot that
overlap within the set of possible matched). Here again there's anarbitrary
decision, but to make the process predictable, you need an ordering of
matches where the first of them will be the first considered for
substitution. No DFA involved here, it does not matter, theonly thing that
matters is the ordering of matches, not the set of matches returned by the
regexp.
> I do not know why this complement operation is important, though.
Because it appears in contextual searches. Ofcourse you may implement a
regexp engine that supports NO negated regexp searches (and no negated
character classes), but it will be impossible to match some items, using a
finite regexp string. And even if you don't want to support negated
searches, the question of ordering still remains, and becomes critical when
you want to support capturing groups (due to conflicting "interest" between
the multiple capturing groups present in the regexp). And even if you don't
support capturing groups, the question of order still remains with regexps
that terminate by a "*" or "+" repeater (but then only a small part of the
order matters).
This archive was generated by hypermail 2.1.5 : Tue Oct 09 2007 - 23:35:11 CDT