From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jan 22 2007 - 14:36:45 CST
Frank Ellermann objected:
> Doug Ewell wrote:
>
> > The greatest roadblock to acceptance of SCSU is its *perception* of
> > complexity. It is not nearly as complicated as it is perceived to be,
> > and I say this having implemented both simple and optimized encoders as
> > well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
> > bit more complex
>
> Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so far
> I gave up on SCSU. To say that it's horrible would be putting it mildly.
I've got to side with Doug here. As he pointed out, the decoder
for SCSU is trivial. And in UTN #14 Doug has written the pseudocode
for an encoder in one page. I implemented the precursor of
SCSU years ago, and while optimizing the encoder can be tricky,
there really isn't all that much to it. (The main difference
between the precursor to SCSU and SCSU itself is that SCSU
defines a bunch of static windows, whereas the precursor just
did everything by calculating one dynamic window.)
There really isn't any reason, with SCSU, to get bogged down
in trying to get some kind of theoretical best behavior out
of the encoder. For all reasonable purposes, good enough is
good enough. Rather than trying to tweak the optimization of
an SCSU encoder to gain another 1% in special cases, it makes
much more sense to simply choose another general compression
mechanism instead, for those special cases.
--Ken
P.S. As for main the topic of this thread, I have to agree with Doug,
Mark, and David Starner. There is nothing compelling enough
about the design of UTF-21/24 that would give it any advantage,
either for storage or for processing, over the existing
UTF-8 and UTF-16.
This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 14:38:52 CST