Re: Encoding designation in Java Script sites

From: Addison Phillips [GSC] (addison@globalsight.com)
Date: Tue Apr 11 2000 - 18:05:24 EDT


>
> > Note that XML is natively Unicode by definition [although most XML books
are
> > amusingly silent about what that means: my copy of The XML Handbook, for
> > example, says that XML is in Unicode and that there is an encoding
called
> > UTF-8 which is compatible with ASCII...... but frustratingly, it doesn't
say
> > what "XML is in Unicode" *means* in terms of actual disk file encoding
or
> > internal parsing...
>
> It means that the character repertoire of XML documents is that of
Unicode.
> Any Unicode character, with stated exceptions (basically most of the C0
> control characters) can be used in any XML document, no matter how the
> document is represented, by using character references of the form
#&2019;.

Well, yes, but why doesn't the documentation *say* that? That makes ASCII
the encoding ;-). Most basic (and much high-level) XML documentation dances
lightly around the encoding issue as if "Unicode" were an encoding. The lack
of clarity about encodings leads to general confusion if it isn't nailed
down. And people ask inprecise questions like "where can I get a Unicode
editor?"...

>
> > it turns out that most parsers use UCS-4 or UTF-16 in
> > their rendering engine and smart implementers use UTF-8 when storing the
> > actual XML files on disk. Yes, you have to declare the encoding for
UTF-8.
>
> UTF-8 need not be declared.

Doh! I knew that...

thanks,

Addison



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT