From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 04 2005 - 16:37:52 CDT
From: "N. Ganesan" <naa.ganesan@gmail.com>
> May be if engineers here work with Arvind Thiagarajan,
> a Tamil engineer, now in Singapore, they can compress
> Unicode few orders of magnitude higher ?!
>
> http://in.rediff.com/money/2005/may/04spec.htm
Lossless image compression (as focused in this article for medical imagery),
is completely out of topic here. The technics used to compress images are
completely unrelated to those used to compress text (even if Unicode needs
losless compression). So whatever technics he uses on images, it certainly
uses 2D gradient properties and representation of those gradients with a
probabilist but lossless encoding technic. It cannot be used to compress
Unicode text.
Also, an image is by itself a self-contained object, and noone really needs
a direct access to the value of an individual pixel. This is not the case
for text, where one frequently needs to enumerate the abstract characters
(code points) that make up a string.
Tamil compresses very well for example with SCSU (with nearly one encoded
byte per codepoint). You could achieve better compression by compressing the
whole text with common dictionnary-based compressors like Lempel-Ziv, but
you'll get difficulties to enumerate or accessing the codepoint values at
random position in the text.
There is absolutely NOTHING in this article about text compression. So this
is useless and out of topic here.
This archive was generated by hypermail 2.1.5 : Wed May 04 2005 - 16:39:11 CDT