Re: All-kana documents

From: Martin Duerst ([email protected])
Date: Tue Mar 05 2002 - 09:31:38 EST

Previous message: Michael Everson: "Re: Offtopic : Unicode and Bengali"
In reply to: ろ〇〇〇〇ろ〇〇〇: "All-kana documents"
Next in thread: Doug Ewell: "Re: All-kana documents"
Reply: Doug Ewell: "Re: All-kana documents"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Character-based compression schemes have been suggested by others.
But this is not necessary, you can take any generic data compression
method (e.g. zip,...) and it will compress very efficiently.

The big advantage of using generic data compression is that
it's already available widely, and in some cases (e.g. modem dialup,
some operating systems, web browsers) it is already built in.
The main disadvantage is that it's not as efficient as specialized
methods for very short pieces of data.

Regards, Martin.

At 18:47 02/03/04 -0500, ろ〇〇〇〇ろ〇〇〇 wrote:
>If I have some all-kana documents (like, say, if I decide to encode some
>old women's literature, not that I will, but you might), is there an
>extension of UTF-8 that will alow me to strip off the redundant "this is
>kana" byte from most of the kana? After the first few thousand kana, it
>might be like, "Yeah, we get it already! It's kana! It's KANA!! You can
>stop reminding us now!!"
>
>This goes too for Hebrew, Greek, etc.
>
>十一ちゃん　　　愛瘢雹加蘭馬

Previous message: Michael Everson: "Re: Offtopic : Unicode and Bengali"
In reply to: ろ〇〇〇〇ろ〇〇〇: "All-kana documents"
Next in thread: Doug Ewell: "Re: All-kana documents"
Reply: Doug Ewell: "Re: All-kana documents"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Mar 05 2002 - 09:52:45 EST