From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Fri Jan 21 2005 - 19:17:14 CST
"Arcane Jill" <arcanejill@ramonsky.com> writes:
> D41: UTF-16LE encoding scheme: The Unicode encoding scheme that serializes a
> UTF-16
> code unit sequence as a byte sequence in little-endian format.
> * In UTF-16LE, the UTF-16 code unit sequence <004D 0430 4E8C D800 DF02> is
> serialized as <4D 00 30 04 8C 4E 00 D8 02 DF>.
> * In UTF-16LE, an initial byte sequence <FF FE> is interpreted as U+FEFF ZERO
> WIDTH NO-BREAK SPACE.
(Below I talk about encoding schemes.)
In UTF-16LE and UTF-16BE there is no BOM, while in UTF-16 an optional
initial FEFF is a BOM.
Why there is only one kind of UTF-8 then? It would be fair if it had
variants like UTF-16 and UTF-32: a variant which doesn't include
special BOM handling, analogous to UTF-16LE and UTF-16BE (obviously
only one flavor is needed, because byte order issues don't apply)
and a variant which does, analogous to UTF-16.
-- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 19:18:21 CST