From: Jill Ramonsky (Jill.Ramonsky@Aculab.com)
Date: Tue Sep 30 2003 - 06:44:50 EDT
Good point. But there has to be an actual attacker here, as in, a hacker 
engaged in a purposefully malevalent attempt to (say) run arbitrary code 
on a victim's machine (the victim being an end-user,  a web-page 
viewer). To achieve this, the attacker must exploit "features" of the 
victim's browser.  Yes, I was assuming that the attacker was a document 
author -- but if the attacker was a server (or at least, a server 
administrator), then it's difficult to see what a document author can do 
to guard against this. If the server is an attacker, they could of 
course modify all documents served anyway, in any manner they chose. In 
such a circumstance, document authors would be well advised to move 
their documents to another server ... assuming they ever found out.
The attack is only theoretical, so far as I know, but basically it works 
like this: the attacker places a link to (say) 
"C:\WINNT\SYSTEM32\CMD.EXE (plus some nasty parameters)" in a hyperlink 
and encourages you to click on it. If all is well, the browser should 
forbid this.  But if the string is written in encoding A, and the 
browser parses it assuming it to be encoding B, it is possible that the 
browser may not recognise the path as being absolute, and so may allow 
it. Of course,  you'd have to try /really hard/ to find encodings A and 
B such that this becomes feasable, but you never know, it might be 
doable. Plus, you'd have to find a user dumb enough to be running a 
sufficiently old browser that it was still prone to this exploit. (I'm 
pretty sure modern browsers will have closed that hole by now, but 
again, you never know). But even a buggy and stupid browser will never 
fall victim to this exploit if the browser is able to infer the correct 
encoding for the document.
But look at it like this. Suppose a html document had a meta tag which 
claimed: <META HTTP-EQUIV="Content-length" CONTENT=1>. In this 
circumstance, which would you prefer to believe: The HTTP Content-length 
header? Or the meta tag? (One can certainly imagine buffer-overrun 
exploits if browsers were to make the wrong choice).
Of course, having said that, document authors /can/ affect HTTP headers 
directly anyway. If the document were to be written in PHP instead of 
HTML then a document author could generate any HTTP headers they wanted! 
(I've actually done this to deliver documents in UTF-8 against the 
server's default). All I can assume is maybe there's some sort of threat 
model in place which assumes that anyone who can code in PHP can't 
possibly be an attacker! If so, it's clearly nonsense.
I still maintain, though (in agreement with Jon) that a server should 
obey the document author by taking notice of meta tags and transforming 
them into HTTP tags. (At the very /least/, it should take the meta tag 
as a hint, and use it as an HTTP tag if the hint turns out to be true). 
To ignore them altogether is just dumb.
Jill
PS. I haven't mentioned Unicode domain names. That's a different kettle 
of fish altogether. Maybe we could have another thread for that.
 
 > -----Original Message-----
 > From: Peter Kirk [mailto:peterkirk@qaya.org]
 > Sent: Monday, September 29, 2003 5:33 PM
 > To: Jill Ramonsky
 > Cc: unicode@unicode.org
 > Subject: Re: Fun with proof by analogy, was Re: Mojibake on
 > my Web pages
 >
 >
 > I know I don't understand all the issues here, but I think I spot one
 > flaw in the argument. This seems to imply that all security holes are
 > the work of the content providers and none related to the servers. In
 > other words, that all servers and their administrators are entirely
 > trustworthy. This is certainly not necessarily true. And if a content
 > provider can compromise security by confusing encodings, so
 > can a server.
 >
 > This could become a significant security hole when we get
 > Unicode domain
 > names. A malicious server administrator could register the mojibake
 > equivalent of a legitimate security sensitive domain name and then
 > deliberately serve the mojibake version to users, etc etc.
 >
This archive was generated by hypermail 2.1.5 : Tue Sep 30 2003 - 07:43:38 EDT