From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 04 2005 - 16:37:52 CDT
From: "N. Ganesan" <naa.ganesan@gmail.com>
> May be if engineers here work with Arvind Thiagarajan,
> a Tamil engineer, now in Singapore, they can compress
> Unicode few orders of magnitude higher ?!
>
> http://in.rediff.com/money/2005/may/04spec.htm
Lossless image compression (as focused in this article for medical imagery), 
is completely out of topic here. The technics used to compress images are 
completely unrelated to those used to compress text (even if Unicode needs 
losless compression). So whatever technics he uses on images, it certainly 
uses 2D gradient properties and representation of those gradients with a 
probabilist but lossless encoding technic. It cannot be used to compress 
Unicode text.
Also, an image is by itself a self-contained object, and noone really needs 
a direct access to the value of an individual pixel. This is not the case 
for text, where one frequently needs to enumerate the abstract characters 
(code points) that make up a string.
Tamil compresses very well for example with SCSU (with nearly one encoded 
byte per codepoint). You could achieve better compression by compressing the 
whole text with common dictionnary-based compressors like Lempel-Ziv, but 
you'll get difficulties to enumerate or accessing the codepoint values at 
random position in the text.
There is absolutely NOTHING in this article about text compression. So this 
is useless and out of topic here.
This archive was generated by hypermail 2.1.5 : Wed May 04 2005 - 16:39:11 CDT