Re: Displaying Plane 1 characters

From: Roman Czyborra (czyborra@cs.tu-berlin.de)
Date: Sun Nov 08 1998 - 07:58:52 EST


> What is the technical readines in different computer environments to
> display non-BMP characters?

X11 fonts are limited to 16 bits because there is no larger index type
than the XChar2b used by XDrawString16(). Hence we will need separate
fonts for each new plane. I am not sure what XLFD (X11 Logical Font
Description) label they ought to get. Plane 0 got *-iso10646-1.
Should we register *-iso10646-2 for Plane 1, *-iso10646-3 for Plane 2
and so on? Is it safe to extrapolate that ISO 10646 Part 2 will define
Plane 1 since ISO 10646 Part 1 defined the Basic Multilingual Plane 0
or will there never be a Part 2 but only partial Amendments?

By the way, I would prefer the name "Plane +01" in order to emphasize
that it is not the first plane but the second or first additional.

Wouldn't the best XLFD be the following?

        *-iso10646-1 UCS Plane +00 (Basic Multilingual Plane)
        *-iso10646-01 UCS Plane +01 (Etruscan, Music, etc.)
        *-iso10646-02 UCS Plane +02
        *-iso10646-03 UCS Plane +03
        *-iso10646-04 UCS Plane +04
        *-iso10646-05 UCS Plane +05
        *-iso10646-06 UCS Plane +06
        *-iso10646-07 UCS Plane +07
        *-iso10646-08 UCS Plane +08
        *-iso10646-09 UCS Plane +09
        *-iso10646-0A UCS Plane +0A
        *-iso10646-0B UCS Plane +0B
        *-iso10646-0C UCS Plane +0C
        *-iso10646-0D UCS Plane +0D
        *-iso10646-0E UCS Plane +0E (also known as "Plane 14")
        *-iso10646-0F UCS Plane +0F (private use)
        *-iso10646-10 UCS Plane +10 (private use, > 2^20)

http://czyborra.com/unifont/ already contains some extraplanar glyphs:
http://czyborra.com/zcat.cgi/unifont/plane+01.hex.gz
http://czyborra.com/zcat.cgi/unifont/plane+0E.hex.gz

The X11 Unicode editor Yudit <http://czyborra.com/yudit/> has not
broken the 16-bit barrier yet so that even with sufficient font
support I cannot type Etruscan yet.

Java is also going to get problems: "\u10208" would be mistaken as
U+1020 <undefined Mongolian character> U+0038 DIGIT EIGHT instead
of U-00010208 ETRUSCAN LETTER TH.

How is Unicode 3.0 going to deal with the extraplanar characters?
Will they be sorted in as UTF-16 surrogate pairs like

D800 DE08:ETRUSCAN LETTER TH

into the character database or will all character numbers be
null-expanded from four to eight UCS-4 hexdigits?

00010208:ETRUSCAN LETTER TH

The (U-)000 prefix will be redundant if all future definitions stay
within the 20-bit range adressable with UTF-16 and the leading zeroes
are awkward to type. Isn't there a shorter notation like U+G208,
<U10208>, U=010208, or U*10208?

I have seen people using a U+12345 notation even though ISO 10646 only
allows either U+1234 or U-12345678.

HTML is ready for Plane +01: you can simply use &#65056;

Cheers, Roman http://czyborra.com/utf/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT