Re: Extended grapheme cluster stability

From: Martinho Fernandes <rmf_at_rmf.io>
Date: Tue, 22 May 2018 14:43:23 +0200

On 22.05.18 12:51, Martinho Fernandes via Unicode wrote:

> Hello,
>
> None of the *_Break properties are stable, as far as I can see in
> https://www.unicode.org/policies/stability_policy.html. If I understand
> correctly, this means that, at least in theory, it is possible that in
> Unicode version X a sequence of characters AB forms an extended grapheme
> cluster, i.e. A × B in the notation used in the algorithm description
> and in the test data, but then in Unicode version X+1, that changes to A
> ÷ B.
>
> Am I reading this correctly or is this not possible? Or is it possible
> in theory but not in practice? Or maybe it has happened before?
>
Hmm, to answer my own question, yes, this has happened before. In
Unicode 8 there were no breaks between regional indicators. In Unicode 9
now there are no breaks "between regional indicator (RI) symbols if
there is an odd number of RI characters before the break point". I has
also happened in the direction break=>no break, with when emoji ZWJ
sequences were introduced.

-- 
Martinho

Received on Tue May 22 2018 - 07:44:56 CDT

This archive was generated by hypermail 2.2.0 : Tue May 22 2018 - 07:44:57 CDT