Geoffrey> May I suggest that using POSIX style equivalence classes would
Geoffrey> be a better syntax? Eg. [[:alnum:]] specifies all alpha-numeric
Geoffrey> characters. [^[:alnum:]] specifies anything but alpha-numeric
Geoffrey> characters. [^[:alnum:][:space]] specifies anything but
Geoffrey> alpha-numeric and whitespace characters.
Geoffrey> From what I can clean from my BSD man page, it is even POSIX
Geoffrey> compliant to add new classes such as non-spacing characters or
Geoffrey> the different blocks. Of course providing something like
Geoffrey> [[:greek:]] leaves open for debate whether it should be the
Geoffrey> U+0370 - U+03FF block or if should also include the other bits
Geoffrey> of Greek scattered around.
I already have the basic Posix equivalence classes, but was under the
impression they were simply codifying existing practice and didn't know there
was room for additions. Looks like there's gonna be a bit of debate over
naming and constituents of additional equivalence classes :-)
I don't want to spend a lot of time guessing which combinations of Unicode
character type properties are going to be wanted just to generate equivalence
class names. Besides, the business of naming inevitably provokes argument.
So, I'll stick with two non-standard, but simple and flexible constructs to
give us the matching resolution we need until strict conformance to some set
of conventions is dictated by circumstances.
-----------------------------------------------------------------------------
mleisher@crl.nmsu.edu
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT