Re: XML and ISO 10646 planes beyond the BMP

From: Kenneth Whistler (
Date: Mon Aug 11 1997 - 17:30:31 EDT

John Cowan recalculated:

> Misha Wolf wrote:
> > S2. In both these places, amend the DESCSET accordingly: In the "SGML
> > Declaration for XML", change the "65376" to "2147483488". In the
> > box labelled "Scope Document", amend the "65536" to "2147483648".
> > The SGML Declaration for HTML 4.0 has "2147483486" rather than
> > "2147483488" (see the least significant digit), but I'll seek to
> > have that changed.
> By my reading of clause 7.1 of ISO/IEC 10646:1993, the correct
> number of codepoints is 2147483646 (from 0000 0000 to 7FFF FFFD inclusive).
> Deducting 160 for the C0, ASCII, and C1 ranges leaves 2147483486,
> in conformity with HTML 4.0.
> --
> John Cowan

The math I do varies slightly yet again. Based on the latest draft
of 10646, with corrigenda, Clause 7a specifies:

   The values of P-, and R-, and C-octets used for representing graphic
   characters shall be in the range 00 to FF. The values of G-octets
   used for rpresentation of graphic characters shall be in the range
   of 00 to 7F. On any plane, code positions FFFE and FFFF shall not
   be used.

That gives the basic size of the coding space as 2 gig, but minus
two characters per plane.

Clause 7b specifies:

   Code positions to which a character is not allocated, except for
   the positions reserved for private use character or for
   transformation formats, are reserved for future standardisation
   and shall not be used for any other purpose.

The important fact here is that U+D800..U+DFFF on the BMP do not
represent characters per se, but are reserved for the UTF-16
transformation format.

So I get:

    7FFF planes x FFFD chars/plane = 7FFD8003
    1 plane BMP x F7FD chars/plane = F7FD
                                         7FFE7800 => 2,147,383,296

If you subtract off 160 for C0,ASCII,C1, you get 2,147,383,136

On the other hand, the range within which valid values are
expressible, is 00000000 .. 7FFFFFFD ( 0 .. 2,147,483,645 ).

It is not clear to me exactly which of these numbers is needed
for the DESCSET in XML.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT