charset in HTTP vs. HTML meta (was Re: UTF-16 and HTML META charset)

From: Glen Perkins (Glen.Perkins@NativeGuide.com)
Date: Tue Feb 22 2000 - 00:40:05 EST


From: Erik van der Poel <erik@netscape.com>
To: Unicode List <unicode@unicode.org>
Cc: Unicode List <unicode@unicode.org>
Sent: Saturday, February 19, 2000 10:37 PM
Subject: Re: UTF-16 and HTML META charset

>
> The browser is supposed to ignore META charset when HTTP charset is
> present. Which version of Netscape did you test? That version may have a
> bug in META charset handling.
>

Yes, this is a question I was discussing with Andrea Vine and some others a
few days ago: whether 'tis nobler to use HTTP headers, HTML meta tags, or
both, under various real-world circumstances. What are the rules for this in

1) any standard, as well as
2) in practice in various versions of Netscape Navigator and
3) in various versions of IE?

Given the current state of things, what's the best approach to serving up
dynamic content in multiple languages?

Assume you're trying to create a website with dynamically-generated pages in
lots of languages, but only one language per page. It's not necessarily easy
to tell the server, page by page, what encoding is being transmitted. Is the
safest, most reliable approach currently to use only the most common,
ASCII-based legacy encodings, use no HTTP Content-Type: text/html;
charset=foo header, but instead include the ASCII meta (http-equiv) tag on
every page?

(The reason for this approach, by the way, is that it would both work
reliably now and prepare the way nicely for a rather gradual change from
those legacy encodings into UTF-8, which would be just another ASCII-based
encoding in this scenario. It's too early for UTF-8 for the general,
consumer web pages, but the same web server could begin serving UTF-8 behind
the firewall, where we could be more daring.)

Would there be problems caused by leaving off the HTTP header charset
declaration and doing all the charset declarations in the HTML meta tag?
Would these problems be significant enough that some method really would
have to be found to include an HTTP header that matched the page's meta tag?
Would it actually be better to declare a wrong encoding in an HTTP header
than declare none at all, for some reason (still assuming all pages were
correctly meta tagged)?

I'm leaving aside the question of non-ASCII-compatible encodings like
UTF-16, which obviously have different issues. If your meta tag is written
in UTF-16, somehow you're going to have to know the encoding before you can
read a meta tag, via HTTP, BOM, or some heuristic. It just doesn't seem
likely to me that any such encoding would be practical on a busy consumer
website that only serves one language per page, but has to have that page
work on a very wide range of browsers. I'm willing to put those encodings
aside for now in favor of ASCII-compatible encodings.

What's the current "best practice"?

Thanks,
__Glen__



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT