Re: Difference between 'combining characters' and 'grapheme extenders'?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 20 Feb 2014 12:10:09 +0100

Many grapheme extenders are not "combining characters". Combining
characters are classified this way for legacy reasons (the very weak
"general category" property) and this property is normatively stabilized.
As well most combining characters have a non-zero combining class and they
are stabilized for the purpose of normalization.

Grapheme extenders include characters that are also NOT combining
characters but controls (e.g. joiners). Some graphemclusters are also more
complex in some scripts (there are extenders encoded BEFORE the base
character; and they cannot be classified as combining characters because
combining characters are always encoded AFTER a base character)

For legacy reasons (and roundtrip compatibility with older standards) not
all scripts are encoded using the UCS character model using combining
characters. (E.g. the Thai script; not following the "logical" encoding
order; but following the model used in TIS-620 and other standards based on
it; including for Windows, and *nix/*nux).

2014-02-20 11:42 GMT+01:00 Mathias Bynens <mathias_at_qiwi.be>:

> What is the difference between 'combining characters' (
> http://www.unicode.org/faq/char_combmark.html) and 'grapheme extenders' (
> http://www.unicode.org/reports/tr44/#Grapheme_Extend) in Unicode?
>
> They seem to do the same thing, as far as I can tell - although the set of
> grapheme extenders is larger than the set of combining characters. I'm
> clearly missing something here. Why the distinction?
>
> I've also posted this question on Stack Overflow:
> http://stackoverflow.com/q/21722729/96656
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode
>

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Feb 20 2014 - 05:11:41 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 20 2014 - 05:11:42 CST