From: Mike (mike-list@pobox.com)
Date: Mon Oct 01 2007 - 10:57:13 CST
>> I think it's a bad idea for \q to have the side
>> effect of changing the meaning of ".".
>
> Well if you don't do that, then [^set\q{ch}] becomes inconsistent and does
> not return the user-expected result, i.e. the exact complement of what
> [set\q{sh}] matches, according to ".".
No, there is no inconsistency. When my compiler encounters a
character class, it creates a new matcher object for it; it
doesn't use the "." matcher (a predefined object).
> [...] as soon as you are introducing collation elements
> in regexps, these are sorted by collation, and collations are
> locale-sensitive...
I don't see why they need to be sorted. All that matters is
that you find the longest match. [a-z\q{ch}] will match "ch"
in "chinchilla" rather than just "c".
> In addition, the meaning of ranges in sets like [a-z] should also be
> consistant with the collation used...
I disagree with this. I think that having [a-z] magically
mean all characters in a particular language is asking for
trouble. In French, would you say that [a-z] should match
C WITH CEDILLA or A + ACUTE?
It's my opinion that ranges inside [] should be simple binary
order. If you want to do anything fancier, there should be
new syntax for it.
Mike
This archive was generated by hypermail 2.1.5 : Mon Oct 01 2007 - 11:00:45 CST