L2/03-041R Source: Mark Davis Date: Feb 10, 2003 (Updated 2003-06-13) Title: Base Character Definition D13 I got a question from someone here about the exact definition of "base character". I took a look at it, and the definition is very badly written. We have: D13 Base character: a character that does not graphically combine with preceding characters, and that is neither a control nor a format character. - Most Unicode characters are base characters. This sense of graphic combination does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures. D14 Combining character: a character that graphically combines with a preceding base character. The combining character is said to apply to that base character. ... In determining what D13 actually means in practice, one might start by analyzing it as follows: - it is a character (so remove Cn, Cs) - it is not a control or format (so remove Cc, Cf) - it is not a combining character (so remove Mc, Mn, Me). But this is not exactly crystal clear. And certainly Zl and Zp (line/paragraph separators) are not explicitly mentioned but must be. The two definitions D13 and D14 are also circular. The definition and notes also do not mention private use characters. I propose the following fix to the text, in light with our new Grand Character Typology in Chapter 2, for the next appropriate version of the Unicode Standard: D13 Base character: an independent graphic character, specifically excluding control and format characters. - Most Unicode characters are base characters. This sense of graphic combination does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures. - In terms of General Category values (see Chapter 4), a base character is any code point that has one of the categories Letter (L), Number (N), Punctuation (P), Symbol (S), Space Separator (Zs). (In other words, it excludes the values Cn, Cs, Cc, Cf, Zl, Zp, Mc, Mn, and Me). The interpretation of Private Use characters (Co) as base characters or not is determined by the implementation. D14 Combining character: a character that graphically depends on the last preceding base character. The combining character is said to apply to that base character. Also known as combining mark. - Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Non-Spacing Mark (Mn), and Enclosing Mark (Me). The interpretation of Private Use characters (Co) as combining characters or not is determined by the implementation. ...