From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Oct 02 2007 - 11:05:58 CST
Mark Davis Wrote:
> Also, there were some interesting suggestions for syntax additions
> that may be worth mentioning in informative text.
> 1. not equals
> As well as
> * \P{propname=value} and [:^propname=value:]
> to have:
> * \{propname!=value}, \p{propname≠value}
> * [:propname!=value:], [:propname≠value:]
I'm not sure that \{propname!=value} should be defined, or recommended or
even suggested: its contextual parsing may complicate things, unlike the 4
others that use a distinctive prefix that helps avoiding conflicts with the
various use of the {} notation.
Also, it conflicts with many other frequent uses of "\{" as the only way to
escape the litteral "{" character itself, when "{ ... }" has a special
meaning in the supported regexp syntax for creating a distinction from "(
... )" for non-capturing groups, or for allowing non-matching spaces to be
used as visual interpretation hints in complex regexps (within those "{ ...
}" non-capturing groups, the litteral spaces that need to be matched by the
regexp will need to be escaped, just like other braces that need to be
interpreted literally as a matching rule instead of their default special
grouping semantic).
Also you propose mixing \p and \P for similar use. The only good suggestion
is the way to represent the "different" relation using an alternate operator
replacing the equal sign, instead of using a leading negation (using a
capital \P instead of \p, or a leading ^ operator in a class notation)
before the encoded equality.
For the rest, the "[: ... :]" bracketing is easily perceived everywhere as
equivalent to the "{ ... }" bracketing (but having to support it looks much
like the use of multiple characters for representing the same "character" in
programming languages using national versions of ISO 646 that did not have
the "{ }" braces in their encoding. It looks ugly (but is used in POSIX
regexps).
> 2. multiple values(...)
> * \p{gc=L|M|Nd} instead of [\p{gc=L}\p{gc=M}\p{gc=Nd}]
Good suggestion but it is quite related to your suggestion 3:
> 3. regex values
> * propname=/regexForValue/
> eg
> * \p{name=/MARK/} or equivalently \N{/MARK/}
So multiple values would also be encoded using your suggestion 3 as:
* \p{gc=/L|M|Nd/}
What do you mean in \p{name=/MARK/} : does this indicate that is will match
any character whose property value "equals" the matched regexp, or
"contains" the regexp. I would not suggest the "contains" meaning, this is
not needed because it should be:
* \p{name=/.*MARK.*/}
But then, why are the slashes needed? If you look at suggestion 2, the
leading and trailing slash is not used, but the multiple values are also
encoded as a regexp. So your suggestion 3 (regexp values) could as well be
supported using the notation in suggestion 2:
* \p{name=MARK} or equivalently \N{MARK}
If you need to encode the "constains" relation rather than the "equals"
relation, I think this relation should be encoded explicitly:
* \p{name=.*MARK.*} or equivalently \N{.*MARK.*}
At least like this, this does not change the reading of the "=" operator as
"equals" in the notation, which can then be replaced where needed by a
"different" operator or negated assertion containing the "=" operator
(related to "does not contain" if there's a regexp in the value starting and
finishing by ".*")
This archive was generated by hypermail 2.1.5 : Tue Oct 02 2007 - 11:09:27 CST