From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Oct 01 2007 - 01:01:33 CST
> cluster, for example. Also, how would you interpret [a\u0300]?
> As (a|\u0300) or (a\u0300)?
I would interpret /[a\u0300]/ unambiguously as /(a|\u0300)/ only. To match a
complete a with its accent:
* we should not need to use the "\u" notation as an helper, but should
encode the accent directly in the regexp, or should use the precombined
character (because they are canonically equivalent).
* If this is not possible (due to the input encoding for the regexp), then
use \q{} to delimit the unbreakable collation element as in /[\q{a\u0300}]/
or simply /\q{a\u0300}/ (which is an equivalent regexp here)
Note how this simple rule does not break the canonical equivalence of the
input regexps, whatever their encoding (the \u notation is not an encoding,
but a regexp notation using multiple characters, and it implies no canonical
equivalence between the regexp encoded directly without this notation, or
the regexp using this notation).
This archive was generated by hypermail 2.1.5 : Mon Oct 01 2007 - 01:04:53 CST