From: Doug Ewell (dewell@adelphia.net)
Date: Thu Sep 21 2006 - 07:26:40 CDT
Hans Aberg <haberg at math dot su dot se> wrote:
> One way to do a character compression is to simply do a frequency
> analysis, sort the characters according to that, which gives a map
> code points -> code points. Then apply a variable width character
> encoding which gives smaller width to smaller non-negative integers,
> like say UTF-8, to that. Here, the compression method cannot do worse
> than UTF-8.
You mean, do Huffman encoding, but with bytes as the basic code unit
instead of bits?
Don't forget you need to store the frequency table along with the
compressed data, so the reader can reconstruct the table. That could
mitigate your compression somewhat.
-- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/ RFC 4645 * UTN #14
This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 07:30:33 CDT