Re: Unicode & space in programming & l10n

From: Hans Aberg (haberg@math.su.se)
Date: Thu Sep 21 2006 - 08:32:10 CDT

  • Next message: Doug Ewell: "Re: Unicode & space in programming & l10n"

    On 21 Sep 2006, at 14:26, Doug Ewell wrote:

    > Don't forget you need to store the frequency table along with the
    > compressed data, so the reader can reconstruct the table. That
    > could mitigate your compression somewhat.

    One can turn this problem around: First do the statistical analysis,
    then choose a translation table that admits a compact representation,
    which is not necessarily the one that gives best compression of the
    text body. Then from this point of view, a byte-compression scheme
    just chooses a more limited set of translation tables. (A binary
    translation table can be viewed, at least roughly so :-), as a code
    point translation table via suitable character encodings.) So when
    playing around in this code point compression picture, one has access
    to everything that can be done in the byte-compression picture.

       Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 08:35:23 CDT