Re: Unicode "no-op" Character? from Richard Wordingham via Unicode on 2019-06-23 (Unicode Mail List Archive)

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Sun, 23 Jun 2019 09:24:50 +0100

On Sat, 22 Jun 2019 23:56:50 +0000
Shawn Steele via Unicode <unicode_at_unicode.org> wrote:

> + the list. For some reason the list's reply header is confusing.
>
> From: Shawn Steele
> Sent: Saturday, June 22, 2019 4:55 PM
> To: Sławomir Osipiuk <sosipiuk_at_gmail.com>
> Subject: RE: Unicode "no-op" Character?
>
> The original comment about putting it between the base character and
> the combining diacritic seems peculiar. I'm having a hard time
> visualizing how that kind of markup could be interesting?

There are a number of possible interesting scenarios:

1) Chopping the string into user perceived characters. For example,
the Khmer sequences of COENG plus letter are named sequences. Akin to
this is identifying resting places for a simple cursor, e.g. allowing it
to be positioned between a base character and a spacing, unreordered
subscript. (This last possibility overlaps with rendering.)

2) Chopping the string into collating elements. (This can require
renormalisation, and may raise a rendering issue with HarfBuzz, where
renomalisation is required to get marks into a suitable order for
shaping. I suspect no-op characters would disrupt this
renormalisation; CGJ may legitimately be used to affect rendering this
way, even though it is supposed to have no other effect* on rendering.)

3) Chopping the string into default grapheme clusters. That
separates a coeng from the following character with which it
interacts.

*Is a Unicode-compliant *renderer* allowed to distinguish diaeresis
from the umlaut mark?

Richard.
Received on Sun Jun 23 2019 - 03:25:21 CDT

This archive was generated by hypermail 2.2.0 : Sun Jun 23 2019 - 03:25:22 CDT