Re: ZWJ, ZWNJ, CGJ and combination

From: Philippe Verdy ([email protected])
Date: Mon Nov 10 2003 - 08:29:26 EST

Next message: Philippe Verdy: "Re: Tamil 0BB3 and 0BD7"

Previous message: Michael Everson: "Re: Ciphers (Was: Berber/Tifinagh)"
In reply to: Peter Kirk: "Re: ZWJ, ZWNJ, CGJ and combination"
Next in thread: Peter Kirk: "Re: ZWJ, ZWNJ, CGJ and combination"
Reply: Peter Kirk: "Re: ZWJ, ZWNJ, CGJ and combination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Peter Kirk" <[email protected]>
> This does not affect my argument. A combining character sequence, as
> defined, does not perfectly fit your definition "an unordered set of
> sequences of characters having the same combining class." But it is
> preserved under canonical normalisation. Well, perhaps that depends what
> you mean by "preserved". If you mean that its code point representation
> is unchanged, that is not true your starter sequences either. If it
> means that its semantics are unchanged, it is true by definition of any
> string of Unicode characters that its semantics are unchanged by
> canonical normalisation, or indeed by any transformation into a
> canonically equivalent form.

I did not say the opposite (that normalization could change semantics).
But normalization does not work at the combining character sequence
level but at the starter sequence level, and it bases the character identity
on this model, that excludes sequences of non-defective starter
sequences (such as non-defective combining sequences) of any semantic
constraints. Normalization effectively ignores combining sequences which
are only a part of the text.

> >I still maintain that there's no terminology to designate what I call a
> >starter sequence.
> >
> >
> >
> Agreed. But does it matter? It does so only if this is a meaningful unit
> within Unicode. On my understanding, a sequence of combining characters
> all of class >0 is meaningful because this is what canonical reordering
> operates on. But such a sequence does not necessarily form a unit with
> the preceding character.

It does matter, because this causes ambiguity in the terminology used
when speaking about normalization constraints, under the stability policy,
and applies constraints on the way that combining sequences can be safely
encoded (when this needs multiple starters, like a base character and CGJ).

Next message: Philippe Verdy: "Re: Tamil 0BB3 and 0BD7"
Previous message: Michael Everson: "Re: Ciphers (Was: Berber/Tifinagh)"
In reply to: Peter Kirk: "Re: ZWJ, ZWNJ, CGJ and combination"
Next in thread: Peter Kirk: "Re: ZWJ, ZWNJ, CGJ and combination"
Reply: Peter Kirk: "Re: ZWJ, ZWNJ, CGJ and combination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 08:58:34 EST