2013/5/18 Richard Wordingham <richard.wordingham_at_ntlworld.com>
> On Sat, 18 May 2013 02:02:07 +0200
> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
> > Yes it is expected. And in fact very common in Unicode since long
> > (there are in fact many "Mn" marks with combining class 0, this is
> > not just for one script).
> >
> > A combining class 0 DOES NOT mean that the character will not be a
> > non-spacing mark (or that it will be spacing), but just that it blocks
> > reorderings under standard normalizations and for recognizing
> > canonical equivalences.
> >
> > (see for example CGJ which also has combining class 0 and which is
> > used mostly to insert such blocking behavior, without having any real
> > semantic meaning by itself ; once the normalization step has been
> > done, it can be discarded from the input stream in renderers or
> > collators, except for special purposes like rendering CGJ with its
> > own visible glyph in some "visible controls" edit mode)
>
> It cannot be discarded when collation is used for sorting.
it can, after the initial noramlization steps (well it also blocks
recognizing digrams as a single "letter" in some alphanet, but the use of
CGJ for that pupose is deprecated in favor of using joiner controls before
another starter character).
Once digrams have been isolated, the CGJ gets discarded for collation
purpose as well (it becomes ignorable).
You'll still find exceptions in some tailorings, but tailorings can in fact
do what they want ; and I don't want to enter to the long list of
exceptions and into why there are so many differences between default
grapheme clusters and actual grapheme clusters needed in some languages and
recognized by tailorings. There are more details about this in the
specification for boundary breakers, which require more properties than
just the general category and the combining class.
Received on Sat May 18 2013 - 02:23:15 CDT
This archive was generated by hypermail 2.2.0 : Sat May 18 2013 - 02:23:17 CDT