Besides invalidating complexity metrics, the issue was what \p{Lu} should match. For example, with PCRE syntax, GNU grep Version 2.25 \p{Lu} matches U+0100 but not <A, U+0300>. When I'm respecting canonical equivalence, I want both to match [:Lu:], and that's what I do. [:Lu:] can then match a sequence of up to 4 NFD characters.
Formally, wouldn't that be rewriting \p{Lu} to match \p{Lu}\p{Mn}*; instead of formally handling NFD, you could extend the syntax to handle "inherited" properties across combining sequences.
Am I missing anything?
A./
This archive was generated by hypermail 2.2.0 : Sun Oct 13 2019 - 19:15:22 CDT