From: Doug Ewell (doug@ewellic.org)
Date: Mon Dec 21 2009 - 08:38:00 CST
Peter Krefting <peter at opera dot com> wrote:
>> "User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
>> encodings."
>>
>> Amazing, isn't it? So thoughtful of the HTML 5 WG to protect
>> developers' time by prohibiting a handful of selected encodings.
>
> There are some security issues related to these, and they are very
> rarely used on actual web pages, which is why they are on the
> "prohibited" list. Full reasoning behind it can probably be found on
> the HTML5 mailing list, although I do not have a link to share. One of
> the problems is that they are not ASCII based, and theoretically
> something like "<script>" can be encoded in such a way that a naïve
> ASCII-based parser wouldn't find it and filter it away from
> user-submitted input, making it easier to do cross-domain attacks.
SCSU is completely ASCII-based, as long as the text is in single-byte
mode, which would be the case for the entire HTML header, and usually
the entire text when encoding small alphabets. In "Unicode mode," SCSU
is essentially UTF-16BE (with a non-ASCII escape for some private-use
characters), and UTF-16BE is not prohibited.
The security issue is largely a red herring. Security of HTML encodings
is related to incorrect auto-discovery of encodings, not to using
encodings that have been properly announced. Even UTF-7, while
generally undesirable and unnecessary for Web pages, is "secure" if
correctly identified.
Henri Sivonen stated that the main reason for prohibiting encodings was
to avoid "wasting developer time" and focusing attention on support of
new features instead. Apparently he didn't feel developers were capable
of both.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Mon Dec 21 2009 - 08:41:16 CST