From: Jill Ramonsky (Jill.Ramonsky@Aculab.com)
Date: Mon Sep 29 2003 - 11:01:49 EDT
I don't see anything wrong with the spec. So far as I can see it is
doing the right thing. Although the behaviour of the described server
could be better.
First point - if no information is present, assume "us-ascii". Sounds
/extremely sensible/ to me. ASCII is the intersection of Latin-1, UTF-8,
and various other commonly used encodings. Moreover, in order to even
/read/ the name of the encoding, the name of the encoding must have
itself been encoded in /something/. It makes sense to me to assume the
absolute minimum. If you want more than the minimum, declare your
encoding. This should not be a problem.
Second point - the "search order" - (1) server; (2) XML tag; (3) HTML
meta tag. This also makes sense to me. Yes, the document author should
know best, but it is the /_server_/, not the /_client_/, which should
take notice of the meta tag.
As far as the browser is concerned, meta tags in the document _/must
not/_ override the headers, as this could result in security holes
exploitable by attackers.
The issue is slightly more complicated. The browser /must/ believe the
HTTP headers. However, if the meta tags and HTTP headers are in conflict
then I believe _the server is at fault_, in not making the correct
declaration. In other words, if the document author says (in a meta tag)
"this is in UTF-8", then the server should (in my opinion) send the
document to the browser with an encoding type of UTF-8. In other words,
the server should (again, in my opinion), ensure that the HTTP header is
not in conflict with a meta tag, by changing the HTTP header to match
the meta tag. However, if a server does not do this, still, then the
browser must believe the HTTP header.
Jill
> -----Original Message-----
> From: John Cowan [mailto:cowan@mercury.ccil.org]
> Sent: Saturday, September 27, 2003 3:48 PM
> To: jameskass@att.net
> Cc: unicode@unicode.org
> Subject: Re: Fun with proof by analogy, was Re: Mojibake on
> my Web pages
>
>
> jameskass@att.net scripsit:
>
> > First, the browser checks the HTTP header, then the XML declaration
> > (which is not relevant to HTML), then the HTML meta tag.
> >
> > Apparently, upon finding character set information, the operation
> > stops, so if information is present in the HTTP header, the meta
> > tag won't be consulted.
>
> It's worse than that. If the HTTP header says "text/xml" or
> "text/html",
> and no charset information is provided, a fully conforming browser
> MUST treat this as if the charset "us-ascii" is specified. That's
> just insane, but such are the rules.
>
> Only if there is no header, or if the header says "application/xml",
> do we get to proceed to other sources of knowledge.
>
> > All of the data should be consulted and there should be some kind
> > of protocol in place to handle conflicting character set info.
>
> It *is* in place and fully specified. It's just that most of us
> don't care for the results, and most programs don't fully conform
> for that reason.
>
> --
> Some people open all the Windows; John Cowan
> wise wives welcome the spring jcowan@reutershealth.com
> by moving the Unix. http://www.reutershealth.com
> --ad for Unix Book Units (U.K.) http://www.ccil.org/~cowan
> (see http://cm.bell-labs.com/cm/cs/who/dmr/unix3image.gif)
>
This archive was generated by hypermail 2.1.5 : Mon Sep 29 2003 - 11:53:27 EDT