On Sat, 22 Jun 2019 23:56:50 +0000
Shawn Steele via Unicode <unicode_at_unicode.org> wrote:
> + the list. For some reason the list's reply header is confusing.
>
> From: Shawn Steele
> Sent: Saturday, June 22, 2019 4:55 PM
> To: SĹ‚awomir Osipiuk <sosipiuk_at_gmail.com>
> Subject: RE: Unicode "no-op" Character?
>
> The original comment about putting it between the base character and
> the combining diacritic seems peculiar. I'm having a hard time
> visualizing how that kind of markup could be interesting?
There are a number of possible interesting scenarios:
1) Chopping the string into user perceived characters. For example,
the Khmer sequences of COENG plus letter are named sequences. Akin to
this is identifying resting places for a simple cursor, e.g. allowing it
to be positioned between a base character and a spacing, unreordered
subscript. (This last possibility overlaps with rendering.)
2) Chopping the string into collating elements. (This can require
renormalisation, and may raise a rendering issue with HarfBuzz, where
renomalisation is required to get marks into a suitable order for
shaping. I suspect no-op characters would disrupt this
renormalisation; CGJ may legitimately be used to affect rendering this
way, even though it is supposed to have no other effect* on rendering.)
3) Chopping the string into default grapheme clusters. That
separates a coeng from the following character with which it
interacts.
*Is a Unicode-compliant *renderer* allowed to distinguish diaeresis
from the umlaut mark?
Richard.
Received on Sun Jun 23 2019 - 03:25:21 CDT
This archive was generated by hypermail 2.2.0 : Sun Jun 23 2019 - 03:25:22 CDT