From: John Cowan (jcowan@reutershealth.com)
Date: Tue Jul 08 2003 - 12:28:57 EDT
Philippe Verdy scripsit:
> Not bogous: the HTTP header is less important than an explicit
> declaration in the XML document.
You've misread me or RFC 3023 or both. The charset parameter in the MIME
header *overrides* the encoding declaration in the XML content. If the
header says "ISO 8859-1", then the character encoding of the contents is
ISO 8859-1, no matter what the encoding declaration says or doesn't say.
What is even worse is that if the media type is text/xml (as opposed to
application/xml), and the charset parameter is not specified, the
character encoding of the contents is US-ASCII, again no matter what
the encoding declaration says or doesn't say.
> The default UTF-8/UTF-16 only applies to the case where there is
> *neither* a XML declaration, *nor* an external meta-data declaration
> such as HTTP headers.
Correct.
> However the BOM may be omitted from the "UTF-16" encoding scheme,
> and in that case it MUST be decoded only as UTF-16BE.
Actually, RFC 2781 says "SHOULD" in that case, not "MUST". I agree that this
should (or even must) be strengthened in future.
-- John Cowan jcowan@reutershealth.com www.ccil.org/~cowan www.reutershealth.com I must confess that I have very little notion of what [s. 4 of the British Trade Marks Act, 1938] is intended to convey, and particularly the sentence of 253 words, as I make them, which constitutes sub-section 1. I doubt if the entire statute book could be successfully searched for a sentence of equal length which is of more fuliginous obscurity. --MacKinnon LJ, 1940
This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 13:20:12 EDT