From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jun 27 2003 - 12:50:23 EDT
On Friday, June 27, 2003 6:01 PM, Philippe Verdy <verdy_p@wanadoo.fr>
wrote:
Given that XML will require normalization for texts identified as
being Unicode encoded (UTF-8 and others), couldn't a document be
labelled so that the normalization step be removed from the XML
processing, using a "ISO-10646-8" encoding name (for the UTF-8
encoding scheme)?
In that case, this would assume that the whole document does not
adopt the Unicode normalization, but still uses the same repertoire...
(So this would optionally remove a processing step for XML parsers,
that would just apply the normalization only on input, but not in the
internal processing, and not even in its output).
Is it too much tricky for the XML conformance requirements? Who
must adapt its standard? For me a document can be fully conforming
to ISO10646 without being conforming to Unicode if it does not want
to use the /implied/ Unicode properties such as combining classes
and Unicode normalization forms (and there are certainly other
interesting normalizations that could be useful for each language)...
The caveat would be more a more complex font layout engine (with
larger tables for combining sequences) if texts can be encoded
without being normalized first...
-- Philippe.
This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 13:27:55 EDT