From: Kenneth Whistler (kenw@sybase.com)
Date: Wed May 14 2003 - 18:45:07 EDT
Rick Cameron asked:
> (I'm sure this is an FAQ - but why are the code points 0xd800-0xdfff not
> considered noncharacters? Obviously no abstract character can be associated
> with them! Is there a different term that describes code points like this?)
It's not in the online preview Chapter 3, but rather in Chapter 2
of Unicode 4.0. (Incidentally, the editors are trying to get
preview versions of Chapters 1 and 2 posted, as well, to help
out with questions like this while we are waiting for the actual
publication of the book from Addison-Wesley.)
The answer is that in the new scheme for 4.0, the Unicode Technical Committee
has decided on a nomenclature that divides code points into
7 basic types (gc refers to General Category property values):
1. Graphic (gc = [L, M, N, P, S, Zs])
2. Format (gc = [Cf, Zl, Zp])
3. Control (gc = Cc)
4. Private-use (gc = Co)
5. Surrogate (gc = Cs)
6. Noncharacter (gc = Cn, in part)
7. Reserved (gc = Cn, in part)
Types 1-4 are considered *assigned* to abstract characters.
Types 5-7 are considered *not assigned* to abstract characters.
Types 1-6 are considered *designated* code points (which means
that the standard specifies something normative about their
usage).
Type 7 are considered *undesigned* code points (which means they
are reserved for future use, and in principle could be turned
into any of types 1-4 or 6 by future changes).
Type 4, Private-use code points, are somewhat odd, in that they
are considered assigned to abstract characters, but the abstract
characters are *truly* abstract, i.e., essentially, private use
character #1, private use character #2, ..., and the standard
gives them no further semantic interpretation. But the convention
was chosen because implementations are more robust if they treat
all the private-use code points as if they had characters assigned
to them, rather than as if they were just reserved.
--Ken
This archive was generated by hypermail 2.1.5 : Wed May 14 2003 - 19:31:35 EDT