From: Doug Ewell (dewell@adelphia.net)
Date: Tue May 08 2007 - 00:00:17 CDT
Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:
> The algorithm given is clearly for compressing UTF-16 data. Look at
> the sign test for three byte difference values. (It could be
> adjusted/corrected to handle arbitrary codepoint differences.) I
> wonder if SCSU would out-perform the algorithm on, say, Shavian.
Shavian can be encoded extremely efficiently in SCSU: only one byte per
character, plus three bytes of overhead (0B 60 08) at the start of the
stream to set up a dynamic window, and another (01) to quote each U+00B7
"namer dot." I doubt the simplified LZ method presented in UTN #31 can
top this, but of course there's nothing like experimentation.
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Tue May 08 2007 - 00:01:35 CDT