From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jun 27 2003 - 11:35:03 EDT
On Friday, June 27, 2003 4:40 PM, John Cowan <jcowan@reutershealth.com> wrote:
> Not so. Sometimes stability is more important than correctness.
Very well answered. I don't see why we need to sacrifice stability when
correcting something. As the error is not in ISO10646, it is definitely not
reasonnable to have ISO10646 endorse the error done by Unicode due
to its stability pact.
For now, the only good solution is to use existing Unicode-only resources
that will not impact the normalization pact, and the ISO10646 unification
work. If this requires defining some additional Unicode semantics or
properties for some language-significant markup characters, this can be
done with variants (if ISO10646 accept it), or with a request for a
dedicated new *invisible* diacritic in the Hebrew block to ISO10646.
May be Unicode should be more prudent with Normalization Forms: if
new characters are added, their combining classes should be
documented as informative before there is a consensus and
experimentation. This will not break the stability pact with XML, which
will simply not accept the new characters before they are stabilized
by Unicode.
So the characters can be standardized by Unicode, and ISO10646, but
be used with caution with XML which can restrict the set of characters
supported to only those for which the canonicalization is not finished.
Why not then documenting these critical normative properties to make
them clearly informative if needed?
For example informative canonical decompositions could be noted with
<canon> (and thus only recognized by compatibility decompositions
until further notice).
And proposed combining classes could be noted with an additional
symbol in the CC column of the UCD (for example a "?").
This would prevent using the character within XML compliant
applications, but it could allow a more rapid development of fonts
and renderers or layout engines, allow experimentations to encode
actual new documents with some safe-guards regarding the
actual character properties.
This would say to IETF and W3C a "warning" this character has
an informative combining class or decomposition. Normalization
at this step is dangerous, and documents should be considered
as already normalized for those characters.
These potentially instable unicode-encoded documents will then
be labelled with the unicode version, as a future revision may
require verigying if the informative properties have become
enforcable. If there's a change in the properties, existing
documents can then be tested to see if they still respect the
proposed normalization, and corrected. If there is no change
after say 1 year, a revision annex publishes these properties
as normative and a incremental version of Unicode is added,
that allows interchange and conservation of the encoded
documents without an explicit Unicode version label.
This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 12:25:57 EDT