From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Nov 26 2003 - 05:29:37 EST
jameskass@att.net wrote:
> Briefly, it's my opinion that applications which claim to support
> and comply with Unicode should not 'step on' Unicode text. Any
> loopholes in the 'letter of the law' which allow applications to
> mung or reject Unicode text should be plugged.
If this "pluging" request must be done, it should be also the case for HTML
and XML.
For now, combining characters can be encoded directly just after a quote
character (single or double) used to mark the beginning of an attribute
value, or just after a tag-closing ">". HTML and XML parsers will parse
these quotes or superior signs by ignoring the combining sequence, creating
defective sequences, but this is a problem.
My opinion is that HTML and XML parsers should not take the quote and
superior sign isolately without considering the whole combining sequence.
This means that such occurences should be considered as syntax errors. If
one really wants to create a Unicode-compliant XML/HTML document containing
defective sequences, these sequences should be encoded with character
entities...
A XML/HTML code generator that generates a serialized document should then
know the list of combining characters, and encode them with numeric entities
when their use is defective (at the beginning of a CDATA section, or of an
attribute value, or of a text element... This would completely "plug the
hole".
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 06:04:12 EST