From: jcowan@reutershealth.com
Date: Thu Mar 18 2004 - 11:58:24 EST
Arcane Jill scripsit:
> Why are characters being assigned codepoints > U+FFFF, when
> there is still loads and loads of unused empty space below that point.
In fact the BMP is currently 87.5% full. When the 32 remaining blocks
currently shown on the Roadmap are completed, it will be almost 99% full.
> Is the BMP being saved for something? Are codepoints < U+010000
> reserved for something of which I am unaware? If so, what? If not, why
> are assignments being made up there in the astral planes?
Supplementary codepoints are used for characters that are judged to be
of very low overall frequency, or that are part of very large character
repertoires, or that are used only in obsolete scripts.
> By my calculations, the total number of currently existent Unicode
> characters is < 0x10000, which means that - currently - ALL existent
> Unicode characters could have been encoded in 16-bits.
Actually not. Unicode 4.0 contains 96447 graphic, format, and control
characters, almost half again as large as the BMP. In addition, there
are currently 139582 private-use characters, reserved noncharacters,
and surrogate codepoints, of which 8482 are on the BMP.
> I don't understand why they don't just get
> assigned in ascending numerical order on a first-come first-served basis.
Partly for administrative convenience, partly because keeping related
characters together allows character-property tables to be cleverly
compressed.
-- The Imperials are decadent, 300 pound John Cowan <jcowan@reutershealth.com> free-range chickens (except they have http://www.reutershealth.com teeth, arms instead of wings and http://www.ccil.org/~cowan dinosaurlike tails). --Elyse Grasso
This archive was generated by hypermail 2.1.5 : Thu Mar 18 2004 - 12:54:24 EST