64K Tables

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Feb 04 1997 - 17:07:57 EST


Keld commented (re use of ISO 14651 tables making for a non-small grep):

> ISO 14651 does not require ISO 10646, but can also be used in
> a ISO 8859-1 or other 8-bit environment, without need for 64 k
> tables.

To clarify the "64K table" concept (which is still widely cited
as a barrier to implementation, as above) note that
tables to support the full Unicode set for conversion, for character
properties, for collation, or whatever, are *never* 64K tables if
properly implemented. There is a clear distinction between the
number of indexical values for a table and the actual space required
for storage of the table.

One actual example: I have implemented a table which stores 41
distinct properties for all Unicode characters in Unicode 2.0,
including all the bidi properties. That table comprises 28,954 bytes.
Special-purpose tables which don't attempt to do so much can be
done in much less space, depending on their purposes.

There are a number of papers published in the proceedings of the
various International Unicode Conferences which discuss easy-to-
implement algorithms for compact tables appropriate for Unicode usage.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT