SGML DESCSET for XML, HTML (was: XML and ISO 10646 ...)

From: John Cowan (cowan@drv.cbc.com)
Date: Tue Aug 12 1997 - 15:19:54 EDT


Since a fully correct DESCSET declaration for the codepoints of
ISO 10646 as amended would require in excess of 65536 declarations,
I propose the following compromise for XML and HTML. Members of
those mailing lists are urged to pass this suggestion on.

        -- This DESCSET for UCS-4 fails to record that FFFE and FFFF on
           the non-UTF-16 planes are also non-characters. To
           do so would require 65502 more lines. --

        DESCSET
             0 9 UNUSED -- C0 space --
             9 2 9 -- TAB, LF --
             11 2 UNUSED -- C0 space --
             13 1 13 -- CR --
             14 18 UNUSED -- C0 space --
             32 95 32 -- ASCII --
             127 33 UNUSED -- DEL, C1 space --
             160 55136 160 -- BMP --
           55296 2048 UNUSED -- Surrogates --
           57344 8190 57344 -- BMP --
           65534 2 UNUSED -- FFFE, FFFF --
           65536 65534 65536 -- Plane 1 --
          131070 2 UNUSED -- 1FFFE, 1FFFF --
          131072 65534 131072 -- Plane 2 --
          196606 2 UNUSED -- 2FFFE, 2FFFF --
          196608 65534 65536 -- Plane 3 --
          262142 2 UNUSED -- 3FFFE, 3FFFF --
          262144 65534 262144 -- Plane 4 --
          327678 2 UNUSED -- 4FFFE, 4FFFF --
          327680 65534 327680 -- Plane 5 --
          393214 2 UNUSED -- 5FFFE, 5FFFF --
          393216 65534 393216 -- Plane 6 --
          458750 2 UNUSED -- 6FFFE, 6FFFF --
          458752 65534 458752 -- Plane 7 --
          524286 2 UNUSED -- 7FFFE, 7FFFF --
          524288 65534 524288 -- Plane 8 --
          589822 2 UNUSED -- FFFE, FFFF --
          589824 65534 589824 -- Plane 9 --
          655358 2 UNUSED -- FFFE, FFFF --
          655360 65534 655360 -- Plane A --
          720894 2 UNUSED -- FFFE, FFFF --
          720896 65534 720896 -- Plane B --
          786430 2 UNUSED -- FFFE, FFFF --
          786432 65534 786432 -- Plane C --
          851966 2 UNUSED -- FFFE, FFFF --
          851968 65534 851968 -- Plane D --
          917502 2 UNUSED -- FFFE, FFFF --
          917504 65534 917504 -- Plane E --
          983038 2 UNUSED -- FFFE, FFFF --
          983040 65534 983040 -- Plane F --
         1048574 2 UNUSED -- FFFE, FFFF --
         1048576 65534 1048576 -- Plane 10 --
         1114110 2 UNUSED -- FFFE, FFFF --
    1114112 2146369534 1114112 -- All other planes to 7FFF FFFD --

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
			e'osai ko sarji la lojban



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT