Re: How many printable characters in 3.2.0?

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Apr 22 2002 - 13:10:10 EDT


Stefan noted:

> Also, you can add accents and such to just about any character. Shall "a"
> with an acute accent be considered a printable character? And what about the
> Chinese character for "one" with an acute accent? And different kinds of
> accents from different scripts can be added to the same character...

Actually, there is terminology for this.

If the question is, "How many representations for printable abstract
characters are there in Unicode?" then the question is unanswerable,
because of the open-endedness of representation by combining character
sequences.

If the question is, "How many encoded characters are there in Unicode
that can be considered 'printable'?" then the question is answerable,
although it is often difficult to make determinations about what
exactly 'printable' means for some edge cases. If you assume a definition
something like: "having a visible glyph, or consisting of a space
character", which is a reasonable extension from the ASCII-based
tradition of "printable", then the figures I gave earlier would apply.

Note that <U+0061, U+0301> is *not* an encoded character in Unicode;
it is a combining character sequence that represents an abstract
character {á}, which also happens to be encoded as U+00E1. If you
stick carefully to these terminological distinctions, then you don't
end up tying yourself into knots about "what is a character?" in Unicode.

--Ken



This archive was generated by hypermail 2.1.2 : Mon Apr 22 2002 - 13:50:46 EDT