Re: Do `Grapheme_Extend` characters only apply to `Grapheme_Base`? from Mathias Bynens on 2014-04-24 (Unicode Mail List Archive)

From: Mathias Bynens <mathias_at_qiwi.be>
Date: Thu, 24 Apr 2014 09:58:31 +0200

On 23 Apr 2014, at 22:16, Mathias Bynens <mathias_at_qiwi.be> wrote:

> Let’s say I’m writing a program that strips combining characters and grapheme extenders from an input string.
>
> For combining marks, I’m looking for any non-combining marks (e.g. `a`) followed by one or more combining marks (e.g. `̃`), and then I remove everything but the non-combining mark (e.g. leaving only `a`). Is this a correct approach?
>
> What should the approach be for grapheme extenders? Should the program only look for `Grapheme_Base` characters followed by `Grapheme_Extend` characters (which includes the code points in `Other_Grapheme_Extend`)?

The email subject should have been “Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?” — sorry for the confusion.

Does anyone know the answer?
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Apr 24 2014 - 02:59:40 CDT

This archive was generated by hypermail 2.2.0 : Thu Apr 24 2014 - 02:59:41 CDT