From: Jonathan Coxhead (jonathan@doves.demon.co.uk)
Date: Mon Sep 24 2007 - 05:33:28 CDT
Mike wrote:
> I played around with the ability to add digraphs to "." and came up
> with two methods. The first would be to specifically list them using
> syntax such as:
I'd just like to point out that a "[ ]" regular expression is defined to
match always exactly one character (if it matches at all).
You can write "[abcdef]" as "(a|b|c|d|e|f)" if you like. You can also write
"(a|bb|ccc|dddd|eeeee|ffffff)", but there is no form using "[ ]" to match the
same thing.
"[ ]" exists primarily as an optimisation, because matching 1 character
against a set is a fast operation, whereas checking against an unknown number of
alternatives of potentially varying lengths ("( | )") is expensive.
So a sequence specified like [^ ] could never match a whole message, or the
string "New York": it could only match a single character.
What exactly this means in the context of Unicode is a different matter, but
I imagine some sort of historical consistency is desirable.
-- ... Jonathan Belmont CA 94002
This archive was generated by hypermail 2.1.5 : Mon Sep 24 2007 - 05:37:06 CDT