Re: Displaying Plane 1 characters (annotating the code table

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 09 1998 - 15:37:17 EST

Next message: Mark Davis: "Re: Displaying Plane 1 characters (annotating the code table"
Previous message: Kenneth Whistler: "Re: UCS-4 as a conformant form of Uniocde"
Maybe in reply to: Markus Scherer: "Re: Displaying Plane 1 characters (annotating the code table"
Next in thread: Mark Davis: "Re: Displaying Plane 1 characters (annotating the code table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus Scherer noted:

> However, it probably makes sense for files as an easy and somewhat compact
> format, and it makes sense for the number of possible characters: 1M + 64k,
> including 128k+6400 private use character code points. There are about 38000
> characters assigned so far, with about 20000-30000 more in the pipeline.

Here are the exact values of what currently is encoded and what Unicode 3.0
will contain (synched with the prospective content of the republication
of ISO/IEC 10646-1):

Unicode 2.1:

6813 Misc. characters
20902 Unihan
11172 Johab Hangul
6400 Private use
2048 Surrogates
65 Controls
2 Not characters
18134 Unassigned assignable

38887 Assigned graphic characters

Unicode 3.0 (prospective, as of November 3, 1998):

10554 Misc. characters
20902 Unihan
6582 Unihan Extension A
11172 Johab Hangul
6400 Private use
2048 Surrogates
65 Controls
2 Not characters
7811 Unassigned assignable

49210 Assigned graphic characters

For a net gain of 10323 new characters.

Others have noted the following, but I would like to reiterate, so that
*correct* rumors can circulate, instead of incorrect ones:

Unicode 3.0 will *not* contain any encoded characters requiring surrogates.
The republication of ISO/IEC 10646-1 will *not* contain any encoded
characters outside of the Basic Multilingual Plane.

Plane 1 (and 2 and 14) are for ISO/IEC 10646-2, which is still in
working draft and which has not yet even started a CD ballot. When 10646
Part 2 progresses far enough, we anticipate publishing a Version 4.0 of
the Unicode Standard -- and *that* will make use of surrogate codes
to access encoded characters on Planes 1 and beyond.

--Ken Whistler

Next message: Mark Davis: "Re: Displaying Plane 1 characters (annotating the code table"
Previous message: Kenneth Whistler: "Re: UCS-4 as a conformant form of Uniocde"
Maybe in reply to: Markus Scherer: "Re: Displaying Plane 1 characters (annotating the code table"
Next in thread: Mark Davis: "Re: Displaying Plane 1 characters (annotating the code table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT