From: Doug Ewell (dewell@roadrunner.com)
Date: Mon Sep 24 2007 - 08:52:04 CDT
"Mike" <mike dash list at pobox dot com> wrote:
>> I'd just like to point out that a "[ ]" regular expression is defined
>> to match always exactly one character (if it matches at all).
>
> Correct. Except that a Spanish speaker would consider "ch" to be a
> single character even though you need two code points to represent it.
I don't think it will ever really be feasible to define regular
expressions in terms of specific languages, to the point of treating
combinations of two or more base characters as a single matchable
"character" on the basis that speakers of language X consider the
combination to be a single "letter."
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Mon Sep 24 2007 - 08:53:53 CDT