Re: Unicode & space in programming & l10n

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Sep 21 2006 - 07:26:40 CDT

Next message: Hans Aberg: "Re: Unicode & space in programming & l10n"

Previous message: Hans Aberg: "Re: Unicode & space in programming & l10n"
In reply to: Hans Aberg: "Re: Unicode & space in programming & l10n"
Next in thread: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: John D. Burger: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hans Aberg <haberg at math dot su dot se> wrote:

> One way to do a character compression is to simply do a frequency
> analysis, sort the characters according to that, which gives a map
> code points -> code points. Then apply a variable width character
> encoding which gives smaller width to smaller non-negative integers,
> like say UTF-8, to that. Here, the compression method cannot do worse
> than UTF-8.

You mean, do Huffman encoding, but with bytes as the basic code unit
instead of bits?

Don't forget you need to store the frequency table along with the
compressed data, so the reader can reconstruct the table. That could
mitigate your compression somewhat.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
RFC 4645  *  UTN #14

Next message: Hans Aberg: "Re: Unicode & space in programming & l10n"
Previous message: Hans Aberg: "Re: Unicode & space in programming & l10n"
In reply to: Hans Aberg: "Re: Unicode & space in programming & l10n"
Next in thread: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: John D. Burger: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 07:30:33 CDT