Re: UTF-8 isn't the default for HTML (was: xkcd: LTR)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 29 Nov 2012 16:10:14 +0100

2012/11/29 Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>

> Philippe Verdy, Thu, 29 Nov 2012 14:24:29 +0100:
> ...
> > But why ? Isn't UTF-8 (or alternatively UTF-16) already the default
> > encoding of XHTML?
> >
> > If not, then we should file a bug in the W3C Validator for not honoring
> the
> > Guideline 9 (even though it is not part of the standard itself, but just
> a
> > recommendation, it should issue at least a warning).
>
> This is exactly the problem. Your "if not" does apply! Because, if one
> presents a XHTML document to the browser as HTML, then then
> windows-1252 - and not UTF-8 - becomes the default encoding. And, in
> fact, as consequence of our dialog, I have notified the developers of
> Unicorn about the shortcoming, asking them to issue a warning.
>

Thanks a lot, this was really hard to see and understand, because I was
only reading the XHTML specs, and the Validator did not complain.

As a side note, the Unicorn Validator which "senses" the content-type (in
its simple interface) will still sense an XHTML content which remains valid
by itself. The issue is only when it is presented as HTML, and this
validator should allow seeing the effect when using HTML parsers (HTML4 or
HTML5) on XHTML documents, by offering the way to select another document
type than the autodetected one (XHTML here), if ever the warning is
displayed. Because the XHTML document may not validate at all when parsed
as HTML (in which case it will first issue warnings about the presence of
XML prologs (which are generally not a problem as they are typically
ignored in browsers), but an error about XML processing instructions (I
don't think that the optional leading XML declaration is a "processing
instruction"), or an error about non-conforming document declaration
(according to the selected HTML flavor: HTML4 or HTML5.

Anyway, we can expect this page design error will be frequent, and HTML5
parsors should still better not discard the XML declaration, but at least
recognize its encoding pseudo-attribute (even if the processing continues
using HTML rules and not XML rules), instead of relying on the presence of
the meta element, which is really ugly and forces the reparsing using the
detected encoding instead of the default windows-1252 (this is
unnecessarily slow).

Making this "Guideline 9" only applicable to past flavors of HTML before
HTML5 when it will be released. In that case the warning issued by the
Validator would only apply to HTML5 or before, but not HTML5. This will
increase the comparibility of HTML5 to parse valid XHTML1 and XHTML5
documents simply created or modified by XML or XHTML editors.
Received on Thu Nov 29 2012 - 09:12:31 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 29 2012 - 09:12:36 CST