From: Doug Ewell (dewell@roadrunner.com)
Date: Thu Feb 07 2008 - 10:26:01 CST
Every once in a while I bring up some of the issues raised in Unicode
Technical Note #14, "A Survey of Unicode Compression," and someone
replies that there really isn't much interest in text compression any
more, now that memory is cheap and disk is cheap and everyone in the
world has a greased-lightning Internet connection at their disposal. It
looks like there might be some lingering interest after all.
Those attempting to defend Unicode against the duplicate encoding
proposed by the Tamils might note that existing Unicode Tamil text can
be reduced to 1 byte per (Unicode) character using SCSU, which is not
true for TACE-16 text, spread as it is across multiple 128-byte
half-blocks. I don't have any TACE-16 text at hand, but it wouldn't
surprise me if Unicode Tamil in SCSU were actually smaller than TACE-16
in any encoding scheme. And remember, *decoding* SCSU is easy.
For them what cares:
http://www.unicode.org/notes/tn14/UnicodeCompression.pdf
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Thu Feb 07 2008 - 10:29:25 CST