From: Jon Hanna (jon@hackcraft.net)
Date: Sat Jan 22 2005 - 12:00:38 CST
or if there's no charset
> specification in HTTP headers, but there's an internal charset
> specified in the document that indicates it's using the UTF-8
> "charset"
*Strictly* in the absence of a charset parameter the header "Content-Type:
text/html" is supposed to be taken as having a default charset parameter of
"charset=iso-8859-1", which is one of the minor changes RFC 2616 (HTTP) made
in its use of MIME (under which the default charset parameter would have
been "charset=us-ascii).
In practice browsers tend to give <meta /> elements priority in such a case,
and even the MIME registration for text/html notes that the whole area of
default charset parameters is problematic. As such while it is strictly
against the letter of the standards it is probably within the general spirit
of being "tolerant in what you accept".
When content is served as application/xhtml+xml, or if an XML declaration is
present, then really only the XML rules for dealing with absent charset
information in HTTP headers should be used, <meta /> elements should be
ignored.
> There's absolutely no need for the HTML or XML standard to
> say anything
> about the BOM, because this is specified elsewhere, in the charset
> definition (using the IANA definition of charsets, also referenced
> normatively by the optional MIME "content-type:" charset
> specifier) and
> its related standards.
For the most part, yes, they both work at a layer above the encoding and the
encoding deals with the BOM. XML does have rules for determining the
encoding in the absence of any information about it, and that therefore does
have to deal with the BOM.
Regards,
Jon Hanna
Work: <http://www.selkieweb.com/>
Play: <http://www.hackcraft.net/>
Chat: <irc://irc.freenode.net/selkie>
This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 11:01:16 CST