Markus Scherer noted:
> However, it probably makes sense for files as an easy and somewhat compact
> format, and it makes sense for the number of possible characters: 1M + 64k,
> including 128k+6400 private use character code points. There are about 38000
> characters assigned so far, with about 20000-30000 more in the pipeline.
Here are the exact values of what currently is encoded and what Unicode 3.0
will contain (synched with the prospective content of the republication
of ISO/IEC 10646-1):
Unicode 2.1:
6813 Misc. characters
20902 Unihan
11172 Johab Hangul
6400 Private use
2048 Surrogates
65 Controls
2 Not characters
18134 Unassigned assignable
38887 Assigned graphic characters
Unicode 3.0 (prospective, as of November 3, 1998):
10554 Misc. characters
20902 Unihan
6582 Unihan Extension A
11172 Johab Hangul
6400 Private use
2048 Surrogates
65 Controls
2 Not characters
7811 Unassigned assignable
49210 Assigned graphic characters
For a net gain of 10323 new characters.
Others have noted the following, but I would like to reiterate, so that
*correct* rumors can circulate, instead of incorrect ones:
Unicode 3.0 will *not* contain any encoded characters requiring surrogates.
The republication of ISO/IEC 10646-1 will *not* contain any encoded
characters outside of the Basic Multilingual Plane.
Plane 1 (and 2 and 14) are for ISO/IEC 10646-2, which is still in
working draft and which has not yet even started a CD ballot. When 10646
Part 2 progresses far enough, we anticipate publishing a Version 4.0 of
the Unicode Standard -- and *that* will make use of surrogate codes
to access encoded characters on Planes 1 and beyond.
--Ken Whistler
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT