From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 14 2003 - 19:41:41 EDT
> Does anyone know if all character encodings that conform to the Unicode spec
There is only *one* character encoding that conforms to the "Unicode spec",
namely, the Unicode character encoding.
> reserve 0x00 - 0x7F to us-ascii characters?
But from this, I infer what you are trying to get at is whether UTF-8,
UTF-16, UTF-32 (each of which is an encoding *form* of the Unicode
character encoding) all reserve those values as ASCII characters.
For the character *encoding*, the answer is yes: U+0000..U+007F are
exactly identical to the characters of ASCII.
For the character encoding *forms*, the answer is no.
In UTF-8, which uses 8-bit code units, 0x00..0x7F are always used
only for U+0000..U+007F, respectively. But for UTF-16, which uses
16-bit code units, and UTF-32, which uses 32-bit code units, the
individual byte values are meaningless, and you could encounter
an 0x00..0x7F byte value anywhere in the middle of a code unit,
and it would have nothing to do with ASCII values.
> If there a spec that require this behavior, which spec is it?
The Unicode Standard. ;-)
See:
http://www.unicode.org/book/preview/ch03.pdf
and, in particular, Section 3.9, Encoding Forms.
--Ken
> Or can anyone give me an example of a conformant character
> encoding that does not reserve these bytes to us-ascii?
> thanks
>
This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 20:23:06 EDT