Re: Possible bug in formal grammar for extended grapheme cluster

From: Mark Davis ☕️ via Unicode <unicode_at_unicode.org>
Date: Sun, 17 Dec 2017 18:17:57 +0100

Thanks for the feedback. You're correct about this; that is a holdover from
an earlier version of the document when there was a more basic treatment of
RI sequences.

There is already an action to modify these. There is a placeholder review
note about that just above

http://www.unicode.org/reports/tr29/proposed.html#Table_Combining_Char_Sequences_and_Grapheme_Clusters

(scroll up just a bit).

Mark

Mark <https://twitter.com/mark_e_davis>

On Sun, Dec 17, 2017 at 4:16 PM, David P. Kendal via Unicode <
unicode_at_unicode.org> wrote:

> Hi,
>
> It’s possible I’m missing something, but the formal grammar/regular
> expression given for extended grapheme clusters appears to have a bug
> in it.
> <https://unicode.org/reports/tr29/#Table_Combining_Char_
> Sequences_and_Grapheme_Clusters>
>
> The bug is here:
>
> RI-Sequence := Regional_Indicator+
>
> If the formal grammar is intended to exactly match the rules given the
> the “Grapheme Cluster Boundary Rules” section below it as-is, then
> this should be
>
> RI-Sequence := Regional_Indicator Regional_Indicator
>
> since as given it would cause any number of RI characters to coalesce
> into a single grapheme cluster, instead of pairs of characters. That
> is, the text U+1F1EC U+1F1E7 U+1F1EA U+1F1FA would represent one
> grapheme cluster instead of the correct two.
>
> --
> dpk (David P. Kendal) · Nassauische Str. 36, 10717 DE · http://dpk.io/
> we do these things not because they are easy, +49 159 03847809
> but because we thought they were going to be easy
> — ‘The Programmers’ Credo’, Maciej Cegłowski
>
>
>
Received on Sun Dec 17 2017 - 11:18:40 CST

This archive was generated by hypermail 2.2.0 : Sun Dec 17 2017 - 11:18:40 CST