Re: Encoding designation in Java Script sites

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Apr 11 2000 - 17:03:53 EDT


"Addison Phillips [GSC]" wrote:
> what "XML is in Unicode" *means* in terms of actual disk file encoding or
> internal parsing... it turns out that most parsers use UCS-4 or UTF-16 in
> their rendering engine and smart implementers use UTF-8 when storing the
> actual XML files on disk. Yes, you have to declare the encoding for UTF-8.
> Byte Order Marks--0xFFFE--are the order of the day for UTF-16 files].
>

the byte order mark is U+feff.

i believe that the xml (or dom?) specification also makes xml utf-16-centric: utf-8 is one of the two default encodings (utf-8 & utf-16), but text offsets are defined in terms of utf-16 code units, as far as i know. i would expect most parsers to use utf-16 internally.

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT