Re: Encoding designation in Java Script sites

From: Lars Marius Garshol (larsga@garshol.priv.no)
Date: Wed Apr 12 2000 - 04:11:38 EDT

Next message: Marco.Cimarosti@icl.com: "RE: IDS rendering and IDS analysis (was RE: Problems/Issues with"
Previous message: Lars Marius Garshol: "Re: Encoding designation in Java Script sites"
Maybe in reply to: Suzanne Topping: "Encoding designation in Java Script sites"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

* Markus Scherer
|
| i believe that the xml (or dom?) specification also makes xml
| utf-16-centric: utf-8 is one of the two default encodings (utf-8 &
| utf-16), but text offsets are defined in terms of utf-16 code units,
| as far as i know. i would expect most parsers to use utf-16
| internally.

There is nothing inherently UTF-16-centric about XML, since there are
no text offsets or anything like it in XML itself. Parsers do have to

- convert strings like '䄲' to the actual character and

- for each character in the document verify that it is within the
allowed character ranges

However, I wouldn't really call this being UTF-16-centric.

The DOM specification, OTOH, does explicitly specify that UTF-16
should be used internally. The CharacterData interface does use
offsets, so here there is a clear UTF-16 bias. The DOM level 1 doesn't
clearly specify how to interpret these offsets, but in level 2 text
appears to the effect that these refer to 16-bit quantities rather
than characters.

--Lars M.

Next message: Marco.Cimarosti@icl.com: "RE: IDS rendering and IDS analysis (was RE: Problems/Issues with"
Previous message: Lars Marius Garshol: "Re: Encoding designation in Java Script sites"
Maybe in reply to: Suzanne Topping: "Encoding designation in Java Script sites"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT