"Martin Duerst" <duerst@w3.org> wrote:
> Character-based compression schemes have been suggested by others.
> But this is not necessary, you can take any generic data compression
> method (e.g. zip,...) and it will compress very efficiently.
Or you can take SCSU-encoded data and apply the generic compression
method to that, and it will probably compress even better.
> The big advantage of using generic data compression is that
> it's already available widely, and in some cases (e.g. modem dialup,
> some operating systems, web browsers) it is already built in.
> The main disadvantage is that it's not as efficient as specialized
> methods for very short pieces of data.
All of this is absolutely correct. But juuitchan had asked for "an
extension of UTF-8" that would eliminate the specific redundancy of
having all (or most) of the characters come from the same contiguous
block. This is exactly what SCSU does best.
Question: Are there really all-kana documents in the real world (other
than children's books)? Or is this one of those exercises like writing
an English-language novel without the letter E?
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.2 : Tue Mar 05 2002 - 11:18:31 EST