mohrin@sharmahd.com (Torsten Mohrin) writes:
TM> In SC UniPad we use a compressed name table. The names are compressed
TM> by encoding the words either in one or two bytes. The separators
TM> (space and hyphen-minus) are encoded in a special way. It works as
TM> follows:
[explanation snipped]
Why not use Huffman encoding? You could precompute the Huffman tables
once and for all, compile them into your program, and only do the
actual encoding/decoding at runtime.
It would be a little bit more computationally expensive than your
scheme due to the need to access parts of bytes, but would yield a
much better compression ratio.
More generally, I get the impression that the Unicode community is
particularly keen on inventing /ad hoc/ compression schemes. I still
haven't heard a sound rationale for the existence of the SCCS. What's
wrong with patent-free variants of LZW?
J.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT