Re: Pure Regular Expression Engines and Literal Clusters from Richard Wordingham via Unicode on 2019-10-11 (Unicode Mail List Archive)

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Fri, 11 Oct 2019 19:18:46 +0100

On Fri, 11 Oct 2019 12:39:56 +0200
Elizabeth Mattijsen via Unicode <unicode_at_unicode.org> wrote:

> Furthermore, Perl 6 uses Normalization Form Grapheme for matching:
> https://docs.perl6.org/type/Cool#index-entry-Grapheme

I seriously doubt that a Thai considers each combination of consonant
(44), non-spacing vowel (7) and tone mark (4) a different character.
Moreover, if what you say is correct, perl6 will be useless for
finding such combinations in correctly spelled text. The regular
expression

\p{insc=consonant}\p{insc=vowel_dependent}\p{insc=tone_mark}

would find only misspellings because in correct Thai spelling, matching
sequences constitute grapheme clusters. I trust perl6 will actually
continue to support analyses of strings as sequences of codepoints.

Richard.
Received on Fri Oct 11 2019 - 13:19:11 CDT

This archive was generated by hypermail 2.2.0 : Fri Oct 11 2019 - 13:19:11 CDT