From: Doug Ewell (dewell@adelphia.net)
Date: Mon Jan 22 2007 - 01:06:55 CST
Ruszlan Gaszanov <ruszlan at ather dot net> wrote:
> Well, SCSU and BOCU are too complex to be considered plain text
> encodings, and do not provide significant advantages comparing to
> general-purpose compression formats, while being much more
> specialized. Therefore, their usability is questionable.
SCSU and BOCU(-1) are most certainly plain-text encodings. Complexity
does not disqualify them from that role, any more than it does for
UTF-7. Their "specialization" is in representing Unicode text; they are
relatively unsuitable for representing arbitrary integer values. I
don't see how this makes them less useful for their intended purpose.
The greatest roadblock to acceptance of SCSU is its *perception* of
complexity. It is not nearly as complicated as it is perceived to be,
and I say this having implemented both simple and optimized encoders as
well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
bit more complex, yet you don't hear anyone complaining that gzip should
not be used because it's too complex.
BOCU-1 is less complex, but more obscure, than SCSU, but it has an
additional problem: its core algorithm is covered under a U.S. patent
(6,737,994) owned by IBM. Although they currently offer a royalty-free
license, IBM has been known to change their terms of licensing from time
to time. I've personally stayed away from BOCU-1 since the patent was
disclosed -- memories of the Unisys GIF patent are still too fresh in my
mind.
Saying that compression formats do not provide advantages over
general-purpose compression turns out to be like saying "History shows
that..." It's not that simple. There are certain GP formats that are
relatively sensitive to the format of input data, and others that are
not. It seems hard to justify playing the card that GP compression is
more efficient, and at the same time playing the card that
compression-oriented encodings (which are much less complex than GP
compression) are too complex.
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 01:09:28 CST