RE: Unicode conformant character encodings and us-ascii

From: Yael.Aharon@nokia.com
Date: Wed May 14 2003 - 20:26:22 EDT

Next message: Eugene Mandel: "weird UTF-8 encoding in MS Exchange 2000 IM client"

Previous message: Kenneth Whistler: "Re: Unicode conformant character encodings and us-ascii"
Maybe in reply to: Yael.Aharon@nokia.com: "Unicode conformant character encodings and us-ascii"
Next in thread: Philippe Verdy: "Re: Unicode conformant character encodings and us-ascii"
Reply: Philippe Verdy: "Re: Unicode conformant character encodings and us-ascii"
Reply: Otto Stolz: "8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I see now why you thought the question was odd. I actually meant to ask about the various iso (e.g. 8859 variants) and windows character encodings.
thanks

> Does anyone know if all character encodings that conform to the Unicode spec

There is only *one* character encoding that conforms to the "Unicode spec",
namely, the Unicode character encoding.

> reserve 0x00 - 0x7F to us-ascii characters?

But from this, I infer what you are trying to get at is whether UTF-8,
UTF-16, UTF-32 (each of which is an encoding *form* of the Unicode
character encoding) all reserve those values as ASCII characters.

For the character *encoding*, the answer is yes: U+0000..U+007F are
exactly identical to the characters of ASCII.

For the character encoding *forms*, the answer is no.

In UTF-8, which uses 8-bit code units, 0x00..0x7F are always used
only for U+0000..U+007F, respectively. But for UTF-16, which uses
16-bit code units, and UTF-32, which uses 32-bit code units, the
individual byte values are meaningless, and you could encounter
an 0x00..0x7F byte value anywhere in the middle of a code unit,
and it would have nothing to do with ASCII values.

> If there a spec that require this behavior, which spec is it?

The Unicode Standard. ;-)

See:

http://www.unicode.org/book/preview/ch03.pdf

and, in particular, Section 3.9, Encoding Forms.

--Ken

> Or can anyone give me an example of a conformant character
> encoding that does not reserve these bytes to us-ascii?
> thanks
>

Next message: Eugene Mandel: "weird UTF-8 encoding in MS Exchange 2000 IM client"
Previous message: Kenneth Whistler: "Re: Unicode conformant character encodings and us-ascii"
Maybe in reply to: Yael.Aharon@nokia.com: "Unicode conformant character encodings and us-ascii"
Next in thread: Philippe Verdy: "Re: Unicode conformant character encodings and us-ascii"
Reply: Philippe Verdy: "Re: Unicode conformant character encodings and us-ascii"
Reply: Otto Stolz: "8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 21:04:13 EDT