From: Doug Ewell (doug@ewellic.org)
Date: Wed Dec 30 2009 - 08:30:07 CST
"Phillips, Addison" <addison at amazon dot com> wrote:
> UTF-7, BOCU, and SCSU are banned either because they auto-detect as
> something other than themselves or because an otherwise "innocuous"
> byte sequence detects as being one of them, thus serving as the basis
> for an XSS attack.
What does SCSU auto-detect as? In an HTML or XML environment, where the
stream starts with a Basic Latin run, SCSU should look like Latin-1
eventually followed by a single-byte mode tag, a C0 control character
that is not NUL, HT, CR, FF, or LF. (If there is no SCSU tag, then the
text *is* Latin-1 except that the single-byte tags are prefixed with
01.)
An initial run of ASCII followed by, say, 0x12 ought to be a reliable
sign of SCSU, unless you have reason to suspect VISCII. The only time
this would fail is if the encoder author decided to be a smart-aleck and
switch into Unicode mode to encode initial ASCII.
BOCU-1, on the other hand, auto-detects as Latin-1 gibberish.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Wed Dec 30 2009 - 08:32:20 CST