Re: Pure Regular Expression Engines and Literal Clusters

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Sun, 13 Oct 2019 17:13:28 -0700
On 10/13/2019 2:54 PM, Richard Wordingham via Unicode wrote:
Besides invalidating complexity metrics, the issue was what \p{Lu}
should match.  For example, with PCRE syntax, GNU grep Version 2.25
\p{Lu} matches U+0100 but not <A, U+0300>.  When I'm respecting
canonical equivalence, I want both to match [:Lu:], and that's what I
do. [:Lu:] can then match a sequence of up to 4 NFD characters.

Formally, wouldn't that be rewriting \p{Lu} to match \p{Lu}\p{Mn}*; instead of formally handling NFD, you could extend the syntax to handle "inherited" properties across combining sequences.

Am I missing anything?

A./

Received on Sun Oct 13 2019 - 19:15:22 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 13 2019 - 19:15:22 CDT