* Markus Scherer
|
| i believe that the xml (or dom?) specification also makes xml
| utf-16-centric: utf-8 is one of the two default encodings (utf-8 &
| utf-16), but text offsets are defined in terms of utf-16 code units,
| as far as i know. i would expect most parsers to use utf-16
| internally.
There is nothing inherently UTF-16-centric about XML, since there are
no text offsets or anything like it in XML itself. Parsers do have to
- convert strings like '䄲' to the actual character and
- for each character in the document verify that it is within the
allowed character ranges
However, I wouldn't really call this being UTF-16-centric.
The DOM specification, OTOH, does explicitly specify that UTF-16
should be used internally. The CharacterData interface does use
offsets, so here there is a clear UTF-16 bias. The DOM level 1 doesn't
clearly specify how to interpret these offsets, but in level 2 text
appears to the effect that these refer to 16-bit quantities rather
than characters.
--Lars M.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT