L2/03-041R2
Re: | Base Character Definition D13 |
From: | Mark Davis |
Date: | 2003-02-10 (Updated 2003-08-26) |
[This document is updated as per the UTC discussion on 8.26; look at the
part below the horizontal line.]
I got a question from someone here about the exact definition of "base
character". I took a look at it, and the definition is very badly written.
We have:
D13 Base character: a character that does not graphically combine with preceding characters, and that is neither a control nor a format character.
Most Unicode characters are base characters. This sense of graphic combination does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures.
D14 Combining character: a character that graphically combines with a preceding base character. The combining character is said to apply to that base character.
In determining what D13 actually means in practice, one might start by
analyzing it as follows:
- it is a character (so remove Cn, Cs)
- it is not a control or format (so remove Cc, Cf)
- it is not a combining character (so remove Mc, Mn, Me).
But this is not exactly crystal clear. And certainly Zl and Zp (line/paragraph
separators) are not explicitly mentioned but must be. The two definitions D13
and D14 are also circular. The definition and notes also do not mention private
use characters. We propose the following fix to the text, in light with our new Grand
Character Typology in Chapter 2, for the next appropriate version of the
Unicode Standard:
D13a Graphic character: a character with the General Categories of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs).
D13b Base character: any graphic character except for those with the General Category of Combining Mark (M).
D14 Combining character: a character with the General Category of Combining Mark (M).