Re: Name Compression. Comparison and Tweaks

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat May 13 2000 - 18:51:11 EDT


I'm becoming concerned about the usage aspect of all of this. There are
some characters in Unicode where the official name is decidedly NOT the
best name for general use. Usually this is because the name that the
committee used was more than a bit arbitrary (or even wrong) or it is based
on appearance, when most users know it by function (or vice versa).

The annotated nameslist that we use to print the Unicode Standard remedies
that by providing a set of aliases in these cases. I strongly urge that
developers of any sort of 'character name lookup' for end users consider
ways to make these aliases accessible. The annotated nameslist is available
online and on the CD-Rom that comes with the book.

>1. The names of C0 and C1 controls (from Unicode 1.0) are also
>encoded.

Looks like Torsten did this in a limited way.

>2. Names of CJK COMPATIBILITY IDEOGRAPHs (F900 -> FA2D) are derived
>algorithmically, saving 9966 bytes.
>
>3. Names of BRAILLE PATTERNs are also derived algorithmically, saving
>6656 bytes. The compiled code for this algorithm takes less than 1 KB.
>5.5 KB total saving is not much, but I couldn't resist :)

Do you algorithmically generate the names for Hangul Syllables as well?

A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT