From: Doug Ewell (dewell@adelphia.net)
Date: Thu May 05 2005 - 00:23:38 CDT
N. Ganesan <naa dot ganesan at gmail dot com> wrote:
> May be if engineers here work with Arvind Thiagarajan,
> a Tamil engineer, now in Singapore, they can compress
> Unicode few orders of magnitude higher ?!
>
> http://in.rediff.com/money/2005/may/04spec.htm
As Philippe wrote, (a) SCSU can compress Tamil text into one byte per
character, plus a constant overhead of two bytes for the *entire* text,
and (b) general-purpose data compression can always be applied to text
to achieve better compression, if you don't mind losing the ability to
process the data as text.
Image data, such as the medical images described in the article, is well
known for being highly compressible. The trick is in finding the right
algorithm for the specific type of image. Not all algorithms work
equally well on all types of data.
For more on compression of Unicode text, have a look at Unicode
Technical Note #14.
-- Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Thu May 05 2005 - 00:25:39 CDT