Unicode character encoding statistics

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Feb 16 2001 - 14:26:25 EST


BTW, if anyone was wondering where I came up with the
figure 880,325 reserved unassigned code points for Unicode
3.1, here are the complete statistics for Unicode 3.0 and
Unicode 3.1:

Unicode: U 3.0 U 3.1

BMP Alphas/Symbols 10236 10238
Suppl Alphas/Symbols 1691
Han (URO) 20902 20902
Han (Ext A) 6582 6582
Han (Ext B) 42711
Han Compat 302 302
Suppl Han Compat 542
Hangul Syllables 11172 11172

Subtotal 49194 94140

BMP Private Use 6400 6400
Suppl Private Use 131068 131068
Surrogate Code Points 2048 2048
Controls 65 65
BMP Noncharacters 2 34
Suppl Noncharacters 32 32
BMP Reserved 7827 7793
Suppl Reserved 917476 872532

The total number of code points accounted for
here is 1,114,112 (= 17 x 64K), i.e.
U+0000..U+10FFFF.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT