From: Francois Yergeau (FYergeau@alis.com)
Date: Mon Sep 29 2003 - 10:27:18 EDT
James Kass wrote:
> In the event of a conflict between the HTTP header and the HTML meta
> tag, of course the browser should believe the HTML meta tag. After
> all, who knows better than the author the encoding used to construct
> the file?
Who knows better the encoding used to *send* the file? The last server to
touch it.
It used to be common, the norm in fact, for Russian servers to store files
in various legacy encodings (KOI-8, 8859-5, DOS-something,...) and to serve
them in some other encoding, after transcoding on-the-fly based on the
User-Agent. There were also transcoding proxies for Asian character sets
that one could use to overcome the limitations of browsers of that era.
These practices were still around when the HTML 4 spec was released in 1997
and no doubt contributed to getting things as they are.
> Where the server has performed a character set conversion
> upon request from a browser, then, as a part of the character set
> conversion process, the HTML meta tag needs to be re-written in case
> the page is archived by the visitor for later off-line viewing.
It takes large amounts of tricky code to reliably parse real-life HTML. It
is unreasonable to expect servers, which have no business parsing HTML, to
contain this code. Browsers have it and *they* should adjust the meta tag
when they do a "Save as..."
> If this were the case, we wouldn't be having this thread.
If servers would just shut up when they don't know (as required by the HTML
spec)....
-- François Yergeau
This archive was generated by hypermail 2.1.5 : Mon Sep 29 2003 - 11:15:10 EDT