Re: All-kana documents

From: Martin Duerst (duerst@w3.org)
Date: Tue Mar 05 2002 - 09:31:38 EST


Character-based compression schemes have been suggested by others.
But this is not necessary, you can take any generic data compression
method (e.g. zip,...) and it will compress very efficiently.

The big advantage of using generic data compression is that
it's already available widely, and in some cases (e.g. modem dialup,
some operating systems, web browsers) it is already built in.
The main disadvantage is that it's not as efficient as specialized
methods for very short pieces of data.

Regards, Martin.

At 18:47 02/03/04 -0500, ろ〇〇〇〇 ろ〇〇〇 wrote:
>If I have some all-kana documents (like, say, if I decide to encode some
>old women's literature, not that I will, but you might), is there an
>extension of UTF-8 that will alow me to strip off the redundant "this is
>kana" byte from most of the kana? After the first few thousand kana, it
>might be like, "Yeah, we get it already! It's kana! It's KANA!! You can
>stop reminding us now!!"
>
>This goes too for Hebrew, Greek, etc.
>
>十一ちゃん   愛瘢雹加蘭馬



This archive was generated by hypermail 2.1.2 : Tue Mar 05 2002 - 09:52:45 EST