Re: Unicode conformant character encodings and us-ascii

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 14 2003 - 19:41:41 EDT

Next message: Yael.Aharon@nokia.com: "RE: Unicode conformant character encodings and us-ascii"

Previous message: Rick McGowan: "Re: Unicode conformant character encodings and us-ascii"
Maybe in reply to: Yael.Aharon@nokia.com: "Unicode conformant character encodings and us-ascii"
Next in thread: Yael.Aharon@nokia.com: "RE: Unicode conformant character encodings and us-ascii"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Does anyone know if all character encodings that conform to the Unicode spec

There is only *one* character encoding that conforms to the "Unicode spec",
namely, the Unicode character encoding.

> reserve 0x00 - 0x7F to us-ascii characters?

But from this, I infer what you are trying to get at is whether UTF-8,
UTF-16, UTF-32 (each of which is an encoding *form* of the Unicode
character encoding) all reserve those values as ASCII characters.

For the character *encoding*, the answer is yes: U+0000..U+007F are
exactly identical to the characters of ASCII.

For the character encoding *forms*, the answer is no.

In UTF-8, which uses 8-bit code units, 0x00..0x7F are always used
only for U+0000..U+007F, respectively. But for UTF-16, which uses
16-bit code units, and UTF-32, which uses 32-bit code units, the
individual byte values are meaningless, and you could encounter
an 0x00..0x7F byte value anywhere in the middle of a code unit,
and it would have nothing to do with ASCII values.

> If there a spec that require this behavior, which spec is it?

The Unicode Standard. ;-)

See:

http://www.unicode.org/book/preview/ch03.pdf

and, in particular, Section 3.9, Encoding Forms.

--Ken

> Or can anyone give me an example of a conformant character
> encoding that does not reserve these bytes to us-ascii?
> thanks
>

Next message: Yael.Aharon@nokia.com: "RE: Unicode conformant character encodings and us-ascii"
Previous message: Rick McGowan: "Re: Unicode conformant character encodings and us-ascii"
Maybe in reply to: Yael.Aharon@nokia.com: "Unicode conformant character encodings and us-ascii"
Next in thread: Yael.Aharon@nokia.com: "RE: Unicode conformant character encodings and us-ascii"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 20:23:06 EDT