From: James Kass (jameskass@att.net)
Date: Thu Jul 13 2006 - 22:43:19 CDT
Philippe Verdy wrote,
> > The autodetection mechanism may be broken, but it can't really be blamed
> > for breaking the HTML code and structure. Without a character set
> > declaration, the HTML code is already broken. No HTML validator should
> > pass such a page.
>
> Why that? the HTML code is correct, except when parsed with a multibyte charset,
> which should not occur as this is not declared, and also which should be
> detected by the heurisitc mechanism when it attempts to identify the charset.
>
> Note that the page does not specify the dtd version, this is then to be parsed
> valid according to legacy HTML 3.2, and without the charset specification, an
> ISO 8859-based charset should be used. Using ISO 8859 makes no parsing error.
> Give me only one sentence in the HTML specs that says that the charset
> indication is mandatory! In legacy HTML 3.2, ISO 8859-1 is even a charset whose
> support is required, as confirmed in the normative DTDs, and the normative list
> of named entities.
The W3C only recommends the "charset" info in the meta tags section,
but it is not mandatory.
It should be, though. How can a parser parse if it doesn't know
which character set to use?
In the case of the French Red Cross page, the W3C HTML validator
detects the character set as ISO-8859-15 and reports many errors
in the HTML. Manually overriding the ISO-8859-15 and making the
validator parse the web page as ISO-8859-1 still produces the same
serious errors in the HTML code of that page.
It would be interesting to see if correcting all the HTML errors
would enable MSIE 7 beta to correctly auto-detect the character set.
Quoting Chris Lilley (of w3.org)
( http://lists.xml.org/archives/xml-dev/199904/msg00081.html )
"But autodetection should not be required; users can label their
documents correctly."
Best regards,
James Kass
(off topic - with regards to AT&T blocking e-mail from Orange, for
many reasons I will be looking for a new ISP and regret that AT&T
blocks certain incoming messages. Because of the tremendous amount
of spam messages coming here, I was finally forced to use AT&T's
spam filter. This spam filter is not user-configurable. It is
suprising to hear that it blocks incoming valid messages such as
yours while still allowing all kinds of 419 scam letters through.)
This archive was generated by hypermail 2.1.5 : Thu Jul 13 2006 - 22:48:31 CDT