Re: All-kana documents

From: Doug Ewell (dewell@adelphia.net)
Date: Tue Mar 05 2002 - 11:20:25 EST


"Martin Duerst" <duerst@w3.org> wrote:

> Character-based compression schemes have been suggested by others.
> But this is not necessary, you can take any generic data compression
> method (e.g. zip,...) and it will compress very efficiently.

Or you can take SCSU-encoded data and apply the generic compression
method to that, and it will probably compress even better.

> The big advantage of using generic data compression is that
> it's already available widely, and in some cases (e.g. modem dialup,
> some operating systems, web browsers) it is already built in.
> The main disadvantage is that it's not as efficient as specialized
> methods for very short pieces of data.

All of this is absolutely correct. But juuitchan had asked for "an
extension of UTF-8" that would eliminate the specific redundancy of
having all (or most) of the characters come from the same contiguous
block. This is exactly what SCSU does best.

Question: Are there really all-kana documents in the real world (other
than children's books)? Or is this one of those exercises like writing
an English-language novel without the letter E?

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Mar 05 2002 - 11:18:31 EST