Re: Unicode 4.0 BETA available for review

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Feb 27 2003 - 14:38:36 EST

Next message: Roozbeh Pournader: "Re: Unicode 4.0 BETA available for review"

Previous message: Tex Texin: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Maybe in reply to: Asmus Freytag: "Unicode 4.0 BETA available for review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Stefan Persson suggested:

> >Unicode 3.0 defined non-shorted UTF-8 as *irregular* code value
> >sequences. There were two types:
> >
> > a. 0xC0 0x80 for U+0000 (instead of 0x00)
> > b. 0xED 0xA0 0x80 0xED 0xB0 0x80 for U+10000 (instead of 0xF0 0x90 0x80
0x80)
> >
> >
> Ah, but encoding NULL as a surrogate character and then encoding those
> two surrogates as three bytes, making totally 6 bytes a character, would
> also be technically possible (though not legal), right?

I'm not sure what you are talking about, here.

First of all, there is no such thing as a "surrogate character",
under the terminology currently adopted by the standard.

There are surrogate code points: U+D800..U+DFFF. Those can
*never* be assigned to any abstract character.

Then there are surrogate code units: 0xD800..0xDFFF. Those are
used in pairs in the UTF-16 encoding form to represent a single
supplementary character (one encoded off the BMP).

NULL is U+0000.
  Its representation in UTF-32 is <0x00000000>.
  Its representation in UTF-16 is <0x0000>.
  Its representation in UTF-8 is <0x00>.

Period. End of story. Anything else is nonconformant to the standard.

--Ken

Next message: Roozbeh Pournader: "Re: Unicode 4.0 BETA available for review"
Previous message: Tex Texin: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Maybe in reply to: Asmus Freytag: "Unicode 4.0 BETA available for review"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Feb 28 2003 - 02:37:35 EST