RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio n)

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Tue Dec 18 2001 - 20:43:29 EST


"In the Unicode Standard, the codespace consists of the integers from 0 to
10FFFF<sub>16</sub>, comprising 1,114,112 code points available for
assigning the repertoire of abstract characters...."

Now that's what I call a definitive statement!

Thanks very much.

- rick cameron

-----Original Message-----
From: Kenneth Whistler [mailto:kenw@sybase.com]
Sent: Tuesday, 18 December 2001 17:37
To: Rick.Cameron@crystaldecisions.com
Cc: unicode@unicode.org
Subject: RE: Astral planes (was: RE: Plane One use, was Re: HTML Validatio
n)

Rick continued:

> OK, so it is there in 3.0. But in the section on Surrogates? And on
> Transformations? A little obscure.

But you need to keep in mind that Chapter 3 is the Conformance chapter, the
key part of the formal definition of the standard.
 
>
> I expected to find it in section 2.3, for example, where the major
> encoding forms are being described; or even earlier - say in 1.1
> Coverage. Surely the range of valid scalar values is an important
> aspect of coverage!

It will be. Here are some sneak peeks at the current draft for the new
Section 2.5 Encoding Forms, for Unicode 4.0:

"In the Unicode Standard, the codespace consists of the integers from 0 to
10FFFF<sub>16</sub>, comprising 1,114,112 code points available for
assigning the repertoire of abstract characters...."

"As for all of the Unicode encoding forms, UTF-32 is restricted to
representation of code points in the range 0..10FFFF<sub>16</sub>, that is,
the Unicode codespace...."

"In the UTF-16 encoding form, code points in the range U+0000..U+FFFF are
represented as a single 16-bit code unit; code points in the supplementary
planes, in the range U+10000..U+10FFFF, are instead represented as pairs of
16-bit code units...."

> I hope this aspect of the standard will be front and centre in 4.0.

Is that front and centre enough for you?

--Ken

P.S. the 1.1 Coverage section is intended to deal (briefly) with what
scripts and types of characters are covered by the standard, and what other
standards are covered by the standard -- not codespace structure or encoding
forms.



This archive was generated by hypermail 2.1.2 : Tue Dec 18 2001 - 20:05:44 EST