From: Addison Phillips [GSC] (
Date: Tue Apr 11 2000 - 14:55:28 EDT

Hi Suzanne,

Javascript based sites are essentially HTML sites, so the encoding is either
in the http header or in a META tag (or non-existant). A META tag is
problematic because it forces the browser's parser to re-read the page from
the beginning, scrapping any data that was previously interpreted and
restarting the Javascript parser (if you've already starting processing a
script). Non-existant is bad for the obvious reasons. The best place for
such a tag is in the http header (where it is essentially invisible to the
end-user: you won't see it when viewing the page source).

XML uses this tag to indicate how to translate the file:

<? XML version=1.0 encoding="Big5"?>

Note that XML is natively Unicode by definition [although most XML books are
amusingly silent about what that means: my copy of The XML Handbook, for
example, says that XML is in Unicode and that there is an encoding called
UTF-8 which is compatible with ASCII...... but frustratingly, it doesn't say
what "XML is in Unicode" *means* in terms of actual disk file encoding or
internal parsing... it turns out that most parsers use UCS-4 or UTF-16 in
their rendering engine and smart implementers use UTF-8 when storing the
actual XML files on disk. Yes, you have to declare the encoding for UTF-8.
Byte Order Marks--0xFFFE--are the order of the day for UTF-16 files].

The encoding is how to decipher the disk file to make it into your parser's
internal "Unicode" [I'm grossly oversimplifying here, of course]. The XML
experts on this list can describe this process much more succinctly than I
can, probably...



