From: Doug Ewell (dewell@adelphia.net)
Date: Tue Jan 23 2007 - 00:53:38 CST
Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:
> Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so
> far I gave up on SCSU. To say that it's horrible would be putting it
> mildly.
Before you give up for good, try reading the Appendix of UTN #14.
> One of the nice features of BOCU-1, a single error destroys at most
> one line. With UTF-8 a single error destroys at most one code point.
> Try that with SCSU, and its various ways to encode the same piece of
> text.
I do not disagree; SCSU is very stateful.
>> BOCU-1 is less complex, but more obscure
>
> Not at all, it's a rather smart application of the 3*7 bits idea
> discussed in this thread, at some point it uses 1114111 = 2**20 +
> 2**16 -1 as biggest possible "jump".
By "obscure" I meant "unknown to most people."
It is possible (but invalid) in BOCU-1 to jump farther than ±0x10FFFF,
not to mention jumps of much shorter distances that would land outside
the valid range. In addition, jumps of greater than +0x2DD0C
or –0x2DD0D are four-byte sequences. I think you may be thinking of
some other algorithm.
> So far they didn't tell me that my BOCU-1 script needs a license -
> okay, that's no serious objection. IMO nobody needs a special
> compression for Unicode anyway. But in theory BOCU-1 is nice,
> especially if compared with SCSU.
The letter from IBM reproduced in PDUTS #40 says: "IBM would like to
bring to your attention, US Patent 6737994 'Binary-Ordered Compression
For Unicode', which may contain claims necessary to, or which may
facilitate the implementation of, BOCU-1." "Necessary to... the
implementation of" means you cannot implement BOCU without infringing on
IBM's patent, unless IBM has granted you a license. IBM is known for
enforcing their IP patents, either sooner or later. I tried for months
to obtain a developer-friendly clarification of this restriction --
something akin to the "freely available" clause in UTR #16
(UTF-EBCDIC) -- and was utterly unable to do so.
>> memories of the Unisys GIF patent are still too fresh in my mind.
>
> The LZW patent is expired worldwide now. It was possible to create
> uncompressed GIFs, http://purl.net/xyzzy/pub/clear1x1.gif (45 bytes)
> vs. clearlzw.gif (43 bytes) is an admittedly silly example.
That's why I said the *memories* were fresh. When the BOCU patent
expires, I might consider dusting off my (fully compliant) encoder and
decoder.
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Tue Jan 23 2007 - 00:55:37 CST