Re: Encoding designation in Java Script sites

From: Addison Phillips [GSC] (
Date: Tue Apr 11 2000 - 18:05:24 EDT

> > Note that XML is natively Unicode by definition [although most XML books
> > amusingly silent about what that means: my copy of The XML Handbook, for
> > example, says that XML is in Unicode and that there is an encoding
> > UTF-8 which is compatible with ASCII...... but frustratingly, it doesn't
> > what "XML is in Unicode" *means* in terms of actual disk file encoding
> > internal parsing...
> It means that the character repertoire of XML documents is that of
> Any Unicode character, with stated exceptions (basically most of the C0
> control characters) can be used in any XML document, no matter how the
> document is represented, by using character references of the form

Well, yes, but why doesn't the documentation *say* that? That makes ASCII
the encoding ;-). Most basic (and much high-level) XML documentation dances
lightly around the encoding issue as if "Unicode" were an encoding. The lack
of clarity about encodings leads to general confusion if it isn't nailed
down. And people ask inprecise questions like "where can I get a Unicode

> > it turns out that most parsers use UCS-4 or UTF-16 in
> > their rendering engine and smart implementers use UTF-8 when storing the
> > actual XML files on disk. Yes, you have to declare the encoding for
> UTF-8 need not be declared.

Doh! I knew that...



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT