L2/02-178

To: UTC
Re: Terminology for types of code points
From: Ed Committee
Date: 2001-04-26

The editorial committee is seeking feedback from the UTC on a matter of terminology, having to do with types of code points.

There are the following main types of code points that we need to distinguish:

  1. 'Normal' characters
  2. Format characters
  3. PUA characters*
  4. Surrogate code points*
  5. Noncharacter code points*
  6. 'Open' code points

Notes:

There are different unions of these sets that are used often, and need their own names. Plus at least 'Open' above needs a good name.

  1. All but surrogates
  2. Surrogate, Noncharacter, Open
  3. Normal, Control, PUA
  4. Open, Noncharacter
  5. Open

We then have terminology that we have used imprecisely in the past:

Assigned In the UCD docs, Cn is equated to this
Also used for #1-#5 (e.g. non-open)
Also used for #1-#3 (e.g. code points not assigned to characters)
Unassigned Inverse of Assigned
Scalar Value Code point
Nonsurrogates
Reserved Surrogate, Noncharacter, Open in 10646, with different adjectives qualifying the different groups.

In coming up with names, we also need to make sure that negations are reasonable: that nonX means all code points that are not X.

The question is, what terms should we choose for A-E. There are a couple of different possible positions:

Term Position1 Position2
A: all but surrogates Nonsurrogate code point Scalar Value code point
B: Surrogate, Noncharacter, Open Noncharacter code point Unassigned code point
C: Normal, Format, PUA Character code point Assigned code point
D: Open, Noncharacter ??? code point ??? code point
E: Open Unassigned code point Nondesignated code point
not Open Assigned code point Designated code point
Noncharacter Internal-Use code point (Infernal-Use ;-) Noncharacter code point
Surrogates Surrogate code point Surrogate code point, Nonscalar value code point

Feedback from the UTC would be most appreciated as to which of these choices would be the most reasonable and least confusing. The committee is not really wonderfully happy with either of these sets of terms; it is very open to different suggestions for a cohesive set!