RE: UTF-8 and UTF-16

From: Marco.Cimarosti@icl.com
Date: Fri Oct 06 2000 - 04:20:36 EDT


I muttered this incomprehensible paragraph:
> - UTF-16 has 16-bit units ("words") and uses 1 or 2 units per
> character. Characters 000000 to 00FFFF use the corresponding
> word; higher values use a pair of "surrogates", the first one
> ("high") being in . It too exists in the same 3 variants as
> bove: little-endian, high-endian, and BOM-marked.

(The passage above demonstrates that even the FAQ of FAQ's my be puzzling,
if you cut away random chunks from it.;-) Sorry, I'm a little bit under
pressure; this is what I meant:

- UTF-16 has 16-bit units ("words") and uses 1 or 2 units per character.
Characters 000000 to 00FFFF use the corresponding word; higher values use a
pair of "surrogates", the first one ("high") being in range D800 to DBFF,
the second one ("low") in range DC00 to DFFF. It too exists in the same 3
variants as above: little-endian, big-endian, and BOM-marked.

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT