Re: New Public Review Issue: Proposed Update UTS #18

From: Doug Ewell (dewell@roadrunner.com)
Date: Mon Sep 24 2007 - 08:52:04 CDT

  • Next message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"

    "Mike" <mike dash list at pobox dot com> wrote:

    >> I'd just like to point out that a "[ ]" regular expression is defined
    >> to match always exactly one character (if it matches at all).
    >
    > Correct. Except that a Spanish speaker would consider "ch" to be a
    > single character even though you need two code points to represent it.

    I don't think it will ever really be feasible to define regular
    expressions in terms of specific languages, to the point of treating
    combinations of two or more base characters as a single matchable
    "character" on the basis that speakers of language X consider the
    combination to be a single "letter."

    --
    Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
    http://users.adelphia.net/~dewell/
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages
    


    This archive was generated by hypermail 2.1.5 : Mon Sep 24 2007 - 08:53:53 CDT