Re: proposal: UTF-20

From: schererm@us.ibm.com
Date: Mon Jan 25 1999 - 13:36:56 EST


Well, I would like to see the same limitation in both Unicode and ISO
10646.

I am trying to approach this from an implementer's view, and I see some
merit in having only 20b code points, which is more of a motivation to me
than proposing this exact encoding.
Maybe it is just my gut feel for "elegance" in encodings that is going wild
with me?

- Bit field storage would not need a hardly used 21st bit.
- An extension of HTML and Java escape codes like "\u20ac" could be
sufficiently defined to use 5 digits instead of 6. (I am not a big friend
of the assumption that is made in the existing escape codes, namely that
the target code is UTF-16, where surrogate pairs form a valid non-BMP
character.) The 6th hexadecimal digit can only be 0 or 1, and the 1 is
rarely used in this position.
  (Having to code non-BMP characters in escape code pairs is
counter-intuitive and makes humans learn the surrogate pair mechanism.)
- An encoding like this "UTF-20" would be possible.
- "Aesthetics" - I am looking more at the scalar values than at the default
encoding, UTF-16.

The only reason for the use of 17 planes instead of 16 seems to be "because
it is what UTF-16 can reach" (I like the UTF-16 encoding very much!).
Only 6 of them are assigned to anything, the other 11 planes with 704k code
points are reserved. My proposal would replace one reserved plane with one
private use plane and re-reserve the private use plane 16 - or just
discourage its use; it is, of course, still available from other encodings
like higher private use planes and groups.

My impression is that the joint effort of Unicode/ISO-10646 has already
passed through a phase of making a compromise between the wish to have a
linear 16b encoding for everything and the realization that there are more
than 64k characters to encode, arriving at an actually used code point
range of 20.1 respectively 21b.
I think that implementers would be more happy with a limitation to 20b.

I have high respect for everything(!) that has been done to create this
character set.

Sincerely,

markus

Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
schererm@us.ibm.com

Michael Everson <everson@indigo.ie> on 99-01-25 12:20:23
Subject: Re: proposal: UTF-20


Ar 08:09 -0800 1999-01-25, scríobh schererm@us.ibm.com:

>I propose the UTF-20 encoding for Unicode/ISO-10646 as a compromise
between
>compactness and the use of scalar integers for all characters in planes 0
>to 15. Planes 16 and above are not accessible.

Then it doesn't have much to do with ISO/IEC 10646, does it.

--
Michael Everson, Everson Gunn Teoranta ** http://www.indigo.ie/egt
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT