And the answer is...
But wait, first some relevant data. Here is the periodic
update for the content of various versions of the
Unicode standard, updated for the just-finalized content
for Unicode 3.2. (The Singapore WG2 meeting sent the relevant
10646 amendment on for FDAM balloting--the final step before
publication--and we are basing the content of Unicode 3.2
on that amendment.)
10646: 1st ed to Amd 7 2nd ed +Part2 Amd 1
Unicode: U 1.0 U 1.1 U 2.0 U 2.1 U 3.0 U 3.1 U 3.2
BMP Alphas/Symbols 4748 6309 6509 6511 10236 10238 11195
Suppl Alphas/Symbols 1691 1691
Han (URO) 20902 20902 20902 20902 20902 20902 20902
Han (Ext A) 6582 6582 6582
Han (Ext B) 42711 42711
Han Compat 302 302 302 302 302 302 361
Suppl Han Compat 542 542
Hangul Sylls 2350 6656 11172 11172 11172 11172 11172
Subtotal 28302 34169 38885 38887 49194 94140 95156
BMP Private Use 5632 6400 6400 6400 6400 6400 6400
Suppl Private Use 131068 131068 131068 131068 131068
Surrogate Code Points 2048 2048 2048 2048 2048
Controls 65 65 65 65 65 65 65
BMP Noncharacters 2 2 2 2 2 34 34
Suppl Noncharacters 32 32 32 32 32
BMP Reserved 31535 24900 18136 18134 7827 7793 6777
Suppl Reserved 917476 917476 917476 872532 872532
(Sorry about that for those of you with email clients that
wrap on less than 77 characters -- at least I took the
tabs out of the lines!)
So Unicode 3.2, which will appear next spring about 1 year after
Unicode 3.1, has added 1016 characters on the BMP.
And that brings us back to the perennial worry some have indicated
on this list:
Is 21 bits enough for all time, or did we make a mistake and
engineer Unicode too small?
Well, space on the BMP *is* getting tight. It shrank another 13%
for Unicode 3.2.
But in the past I have speculated that it would take 700 years at
the current rate to fill up all the available planes. Now I have
to revise that estimate up to 865 years at the current rate.
In fact, even that may be overly optimistic. The Singapore WG2
meeting did its level best to keep up the pace, starting two new
amendments that will go into Unicode 4.0 another year after Unicode 3.2.
But the best they could come up with so far is 227 more characters
for the BMP and 682 more for Planes 1 and 14, for a grand total
of 909 additions. And they only got up to that count by the inclusion
of a big chunk of 240 variation selectors for use by the Han
character mavens to catalog Han character variants until they
get tired (or go blind -- whichever comes first, *hehe*).
So unless we seriously up the encoding pace next year, I'm afraid
we might not make our next millennium deadline to use all of the
available code points.
--Ken
This archive was generated by hypermail 2.1.2 : Mon Nov 05 2001 - 21:25:53 EST