From: Doug Ewell (doug@ewellic.org)
Date: Thu Dec 17 2009 - 23:23:43 CST
"verdy_p" <verdy underscore p at wanadoo dot fr> wrote:
>> According to the section "Intellectual Property" in UTN #6, users
>> must
>> request a license from IBM to implement BOCU-1. Can you point me to a
>> passage somewhere that grants a "free license for free use" without
>> requesting permission from IBM?
>
> They are not required to get a licence if they use ICU to support
> BOCU-1 with full compliance, which is legally licenced in an open way
> that does not require a personal permission by IBM: IBM has already
> given us this permission in ICU, accepting its own open licencing
> terms.
I don't count it as a free license for free use if I have to use a
certain vendor's tool, no matter how wonderful it is -- especially if
that tool has licensing terms, no matter how liberal they are. This
might be fine for some technologies and file formats, but we are talking
about a *character encoding*, for heaven's sake. I should be able to
write my own implementation in 6502 assembly code if I want to.
> I have NOT said that BOCU (without the "-1" suffix) is open/free:
I know you haven't. It is patented, and because of that, profiles of
BOCU such as BOCU-1 are patented too. But then, Marcia Courtemanche
already told us that.
> As a consequence, it's impossible to adapt BOCU to make it conforming
> to ISO 8859 requirements, or even to ISO 646 requirements, or just to
> filesystem naming requirements (slashes, dots, or ASCII letter case
> folding), without asking for such a permission. (It's possible to do
> that with BOCU with such licence, but completely impossible with
> BOCU-1 without breaking it).
One of the claims in the BOCU patent is "[t]he method... wherein the
characters requiring higher code point numbers [than U+0020] are Greek."
I take that to mean that ASCII opacity is part of the nature of BOCU.
> The patent is however highly questionable: it attempts to cover cases
> that are already free since long (notably it covers all numeric bases,
> not just the base-243 used in BOCU-1): it could as well cover Base64
> or Hexadecimal or Base85 of PostScript, or the encoding used in
> Punycode! The principles of decomposition of numbers in a numeric
> base, and the principles of representing non-decimal digits with a
> single octet mapped differently from the numeric value of the digit,
> is used since very long. This is also true with the variable-length
> encoding of string lengths (just using bit pattern prefixes here, for
> Huffmann coding using predermined statistics).
Makes you wonder what sort of research is being done by USPTO.
> May be the only difference with other algorithms is that BOCU uses two
> distinct mappings from digits (whose values are all those of a single
> based positional numeric system) into byte values : one subset of byte
> values (alphabet) for the remaining lower bits only in the prefix byte
> (to encode the most significant digit), and another (larger) alphabet
> for the remaining digits.
I'm not sure what this means, but all multiple-byte character encodings
have different ranges for lead bytes and trail bytes. Self-delimiting
numeric values use a different range for the last byte of the sequence.
So this idea isn't novel either.
I'd be surprised to see any real-world text encoded in BOCU-1, not only
because it's probably the world's only IP-encumbered character encoding,
but because it has been stigmatized by the HTML 5 Working Draft
<http://www.w3.org/TR/html5/>, which actually *forbids* conformant user
agents from recognizing it (along with CESU-8 and UTF-7 and SCSU).
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Thu Dec 17 2009 - 23:28:32 CST