From: Jill Ramonsky (Jill.Ramonsky@Aculab.com)
Date: Mon Sep 29 2003 - 11:01:49 EDT
I don't see anything wrong with the spec. So far as I can see it is 
doing the right thing. Although the behaviour of the described server 
could be better.
First point - if no information is present, assume "us-ascii". Sounds 
/extremely sensible/ to me. ASCII is the intersection of Latin-1, UTF-8, 
and various other commonly used encodings. Moreover, in order to even 
/read/ the name of the encoding, the name of the encoding must have 
itself been encoded in /something/. It makes sense to me to assume the 
absolute minimum. If you want more than the minimum, declare your 
encoding. This should not be a problem.
Second point - the "search order" - (1) server; (2) XML tag; (3) HTML 
meta tag. This also makes sense to me. Yes, the document author should 
know best, but it is the /_server_/, not the /_client_/, which should 
take notice of the meta tag.
As far as the browser is concerned, meta tags in the document _/must 
not/_ override the headers, as this could result in security holes 
exploitable by attackers.
The issue is slightly more complicated. The browser /must/ believe the 
HTTP headers. However, if the meta tags and HTTP headers are in conflict 
then I believe _the server is at fault_, in not making the correct 
declaration. In other words, if the document author says (in a meta tag) 
"this is in UTF-8", then the server should (in my opinion) send the 
document to the browser with an encoding type of UTF-8. In other words, 
the server should (again, in my opinion), ensure that the HTTP header is 
not in conflict with a meta tag, by changing the HTTP header to match 
the meta tag. However, if a server does not do this, still, then the 
browser must believe the HTTP header.
Jill
 > -----Original Message-----
 > From: John Cowan [mailto:cowan@mercury.ccil.org]
 > Sent: Saturday, September 27, 2003 3:48 PM
 > To: jameskass@att.net
 > Cc: unicode@unicode.org
 > Subject: Re: Fun with proof by analogy, was Re: Mojibake on
 > my Web pages
 >
 >
 > jameskass@att.net scripsit:
 >
 > > First, the browser checks the HTTP header, then the XML declaration
 > > (which is not relevant to HTML), then the HTML meta tag.
 > >
 > > Apparently, upon finding character set information, the operation
 > > stops, so if information is present in the HTTP header, the meta
 > > tag won't be consulted.
 >
 > It's worse than that.  If the HTTP header says "text/xml" or
 > "text/html",
 > and no charset information is provided, a fully conforming browser
 > MUST treat this as if the charset "us-ascii" is specified.  That's
 > just insane, but such are the rules.
 >
 > Only if there is no header, or if the header says "application/xml",
 > do we get to proceed to other sources of knowledge.
 >
 > > All of the data should be consulted and there should be some kind
 > > of protocol in place to handle conflicting character set info.
 >
 > It *is* in place and fully specified.  It's just that most of us
 > don't care for the results, and most programs don't fully conform
 > for that reason.
 >
 > --
 > Some people open all the Windows;       John Cowan
 > wise wives welcome the spring           jcowan@reutershealth.com
 > by moving the Unix.                     http://www.reutershealth.com
 >   --ad for Unix Book Units (U.K.)       http://www.ccil.org/~cowan
 >         (see http://cm.bell-labs.com/cm/cs/who/dmr/unix3image.gif)
 >
This archive was generated by hypermail 2.1.5 : Mon Sep 29 2003 - 11:53:27 EDT