From: Doug Ewell (doug@ewellic.org)
Date: Fri Dec 18 2009 - 08:39:28 CST
"verdy_p" <verdy underscore p at wanadoo dot fr> wrote:
> Separate ranges has a benefit: it allows fast text search algorithms
> to work reliably as it allows easy resynchronisation from random
> positions.
It is a fundamental feature of UTF-8 and UTF-16. I don't remember
seeing a claim about separate ranges in the BOCU patent, but one would
think an attempt to claim that as an innovation would be untenable.
> I did not know that HTML5 *forbidded* supporting some MIME-registered
> charsets.
>
> Do you mean instead that it forbids recognizing automatically when the
> charset is unknown (not specified by the resource server, and not
> specified with the source link) and must be guessed from the bytes
> content of the stream ?
From
http://www.w3.org/TR/html5/infrastructure.html#character-encodings-0 :
"User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
encodings."
Amazing, isn't it? So thoughtful of the HTML 5 WG to protect
developers' time by prohibiting a handful of selected encodings. I can
support Fieldata or PTTC/EBCD in my user agent if I want to, but not
UTF-7 or SCSU.
> You don't have to use ICU actually. ICU components can be fully
> isolated and rewritten in any other language. But you have to include
> its licence as your new work will be a derived work based on a
> copyrighted work, even if it does not use any piece of its source
> code.
Right. So suppose I want to implement BOCU-1 from scratch, possibly in
an attempt to speed up encoding or decoding? Can't do it without asking
IBM for a license. (Note that I haven't actually looked at the ICU code
to see if it is already optimally fast. You get the point.)
> Almost all softwares today include several copyright notices
I'm not interested, for the moment, in the copyright notices attached to
software or libraries or other development tools. BOCU-1 is a
compression encoding, a relatively straightforward way (compared to gzip
and such) to represent Unicode characters as a sequence of bytes,
similar to UTF-8 and -7 and -16 and -32 and SCSU and
ASCII-with-XML-entities and all the rest. But only BOCU-1 among these
requires me to even think about licenses.
> For this reason, I don't consider the ICU licence intrusive and
> blocking, and BOCU-1 as provided through ICU, is both a free (FSF
> definition) and open (OSI definition) software which does not restrict
> rewriting it completely.
I haven't read the ICU license thoroughly, but I'd be surprised if
"rewriting it completely" is allowed.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Fri Dec 18 2009 - 08:40:45 CST