From: Hans Aberg (haberg@math.su.se)
Date: Thu Sep 21 2006 - 06:45:53 CDT
On 21 Sep 2006, at 08:13, Asmus Freytag wrote:
> If you assume a large alphabet, then your compression gets worse,
> even if the actual number of elements are few.
So why would that be? - In one compression method, one just makes a
frequency analysis on the characters used, and encodes based on that.
So table entries need only be for characters actually used.
One way to do a character compression is to simply do a frequency
analysis, sort the characters according to that, which gives a map
code points -> code points. Then apply a variable width character
encoding which gives smaller width to smaller non-negative integers,
like say UTF-8, to that. Here, the compression method cannot do worse
than UTF-8.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 06:48:26 CDT