From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 10 2003 - 08:29:26 EST
From: "Peter Kirk" <peterkirk@qaya.org>
> This does not affect my argument. A combining character sequence, as
> defined, does not perfectly fit your definition "an unordered set of
> sequences of characters having the same combining class." But it is
> preserved under canonical normalisation. Well, perhaps that depends what
> you mean by "preserved". If you mean that its code point representation
> is unchanged, that is not true your starter sequences either. If it
> means that its semantics are unchanged, it is true by definition of any
> string of Unicode characters that its semantics are unchanged by
> canonical normalisation, or indeed by any transformation into a
> canonically equivalent form.
I did not say the opposite (that normalization could change semantics).
But normalization does not work at the combining character sequence
level but at the starter sequence level, and it bases the character identity
on this model, that excludes sequences of non-defective starter
sequences (such as non-defective combining sequences) of any semantic
constraints. Normalization effectively ignores combining sequences which
are only a part of the text.
> >I still maintain that there's no terminology to designate what I call a
> >starter sequence.
> >
> >
> >
> Agreed. But does it matter? It does so only if this is a meaningful unit
> within Unicode. On my understanding, a sequence of combining characters
> all of class >0 is meaningful because this is what canonical reordering
> operates on. But such a sequence does not necessarily form a unit with
> the preceding character.
It does matter, because this causes ambiguity in the terminology used
when speaking about normalization constraints, under the stability policy,
and applies constraints on the way that combining sequences can be safely
encoded (when this needs multiple starters, like a base character and CGJ).
This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 08:58:34 EST