Re: Encoding designation in Java Script sites

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Apr 11 2000 - 16:54:47 EDT


"Addison Phillips [GSC]" wrote:

> Note that XML is natively Unicode by definition [although most XML books are
> amusingly silent about what that means: my copy of The XML Handbook, for
> example, says that XML is in Unicode and that there is an encoding called
> UTF-8 which is compatible with ASCII...... but frustratingly, it doesn't say
> what "XML is in Unicode" *means* in terms of actual disk file encoding or
> internal parsing...

It means that the character repertoire of XML documents is that of Unicode.
Any Unicode character, with stated exceptions (basically most of the C0
control characters) can be used in any XML document, no matter how the
document is represented, by using character references of the form #&2019;.

> it turns out that most parsers use UCS-4 or UTF-16 in
> their rendering engine and smart implementers use UTF-8 when storing the
> actual XML files on disk. Yes, you have to declare the encoding for UTF-8.

UTF-8 need not be declared. UTF-16 need not be declared either, provided
a BOM is given. All other encodings (including BOM-less UTF-16LE and UTF-16BE)
must be declared. Declarations may be outside the document or inside it;
a declaration outside the document (in a MIME Content-type header, for example)
supersedes one inside it.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT