From: Doug Ewell (doug@ewellic.org)
Date: Tue Feb 22 2011 - 14:06:11 CST
<mpsuzuki at hiroshima dash u dot ac dot jp> wrote:
> The resynchronization on newline (or on ASCII punctuation)
> is needed, but I think today it is becoming insufficient
> gradually.
Again, it depends on the intended purpose of this (or any other)
encoding scheme. Resynchronization adds redundancy, which costs bytes.
If the goal is to minimize bytes, the encoding scheme has to strip away
as much redundancy as possible.
Most people now suggest general-purpose compression as the "best" way to
compress Unicode text. Drop one byte out of a deflated or bzipped file,
and the resulting damage to the text will be arbitrary.
Note that UTF-8, which has plenty of redundancy, was never represented
to be the smallest possible way to encode characters; it was only
represented not to be extravagant.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
This archive was generated by hypermail 2.1.5 : Tue Feb 22 2011 - 14:09:39 CST