From: Frank Ellermann (nobody@xyzzy.claranet.de)
Date: Mon Jan 22 2007 - 11:58:01 CST
Doug Ewell wrote:
> The greatest roadblock to acceptance of SCSU is its *perception* of
> complexity. It is not nearly as complicated as it is perceived to be,
> and I say this having implemented both simple and optimized encoders as
> well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
> bit more complex
Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so far
I gave up on SCSU. To say that it's horrible would be putting it mildly.
One of the nice features of BOCU-1, a single error destroys at most one
line. With UTF-8 a single error destroys at most one code point. Try
that with SCSU, and its various ways to encode the same piece of text.
> BOCU-1 is less complex, but more obscure
Not at all, it's a rather smart application of the 3*7 bits idea discussed
in this thread, at some point it uses 1114111 = 2**20 + 2**16 -1 as biggest
possible "jump".
> an additional problem: its core algorithm is covered under a U.S.
> patent (6,737,994) owned by IBM. Although they currently offer a
> royalty-free license, IBM has been known to change their terms of
> licensing from time to time.
So far they didn't tell me that my BOCU-1 script needs a license - okay,
that's no serious objection. IMO nobody needs a special compression for
Unicode anyway. But in theory BOCU-1 is nice, especially if compared
with SCSU.
> memories of the Unisys GIF patent are still too fresh in my mind.
The LZW patent is expired worldwide now. It was possible to create
uncompressed GIFs, http://purl.net/xyzzy/pub/clear1x1.gif (45 bytes)
vs. clearlzw.gif (43 bytes) is an admittedly silly example.
Frank
This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 12:07:05 CST