free code values in unicode 3.0

From: schererm@us.ibm.com
Date: Wed Mar 03 1999 - 11:13:45 EST


I am trying to count how many character codes there are
going to be free for future assignment after Unicode 3.0.
Kenneth Whistler recently gave the number 7815 for
reserved values in the BMP.

However, as Michael Everson shows in a nice way
(http://www.indigo.ie/egt/standards/iso10646/ucs-roadmap.html),
many reserved values are scattered among partly assigned blocks.

As he suggests, I have counted the "columns" that are entirely
free. (A column is a set of 16 cells=code values that are
16-aligned, i.e., cells ...0-...f .)

Below are the columns that I found, looking through the
1999dec15 draft for the Unicode 3.0 book.

The total number of free columns is:

    400 columns * 16 = 6400 code values

Assuming that columns "in" named and used blocks are
not going to be used for new scripts, this is
the total number of free columns for new scripts:

    353 columns * 16 = 5648 code values

(the "between" columns)

Not all of these may be used for any kind of script
if scripts are continued to be grouped by common
features.

This is not very much?!

Is this a reasonable and correct count?

markus

columns # (in or between blocks)

0240-024f 1 (in Latin Extended-B)
02f0-02ff 1 (in Spacing Modifier Letters)
0350-035f 1 (in Combining Diacritical Marks)
0500-052f 3 (between Cyrillic and Armenian)
0750-077f 3 (between Syriac and Thaana)

07c0-08ff 20 (between Thaana and Devanagari)
0af0-0aff 1 (in Gujarati)
0c70-0c7f 1 (in Telugu)
0cf0-0cff 1 (in Kannada)
0d70-0d7f 1 (in Malayalam)
0de0-0def 1 (in Sinhala)
0e60-0e7f 2 (in Thai)
0ee0-0eff 2 (in Lao)
0fd0-0fff 3 (in Tibetan)
1060-109f 4 (in Myanmar)
1380-139f 2 (between Ethiopic and Cherokee)
1700-177f 8 (between Runic and Khmer)
18b0-1dff 85 (between Mongolian and Latin Extended Additional)

2050-205f 1 (in General Punctuation)
2090-209f 1 (in Superscripts and Subscripts)
20b0-20cf 2 (in Currency Symbols)
20f0-20ff 1 (in Combining Diacritical Marks for Symbols)
2140-214f 1 (in Letterlike Symbols)
23a0-23ff 6 (in Miscellaneous Technical)
2430-243f 1 (in Control Pictures)
2450-245f 1 (in Optical Character Recognition)
24f0-24ff 1 (in Enclosed Alphanumerics)
2680-26ff 8 (between Miscellaneous Symbols and Dingbats)
27c0-27ff 4 (between Dingbats and Braille Patterns)
2900-2e7f 88 (between Braille Patterns and CJK Radicals Supplement)
2fe0-2fef 1 (between Kangxi Radicals and Ideographic Description)

31c0-31ff 4 (between Extended Bopomofo and Enclosed CJK Letters and
Months)
3250-325f 1 (in Enclosed CJK Letters and Months)
9fb0-9fff 5 (in CJK Unified Ideographs)
a4d0-abff 115 (between Yi Radicals and Hangul Syllables)

fa30-fa5f 3 (in CJK Compatibility Ideographs)
fa60-faff 10 (between CJK Compatibility Ideographs and Alphabetic
Presentation Forms)
fbc0-fbcf 1 (in Arabic Presentations Forms-A)
fd40-fd4f 1 (in Arabic Presentations Forms-A)
fdd0-fdef 2 (in Arabic Presentations Forms-A)
fe00-fe1f 2 (between Arabic Presentations Forms-A and Combining Half
Marks)

Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
schererm@us.ibm.com
                        Unicode is here! --> http://www.unicode.org/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT