Re: SCSU question

From: Tony Graham (
Date: Fri May 12 2000 - 11:03:58 EDT

At 12 May 2000 00:32 -0800, Vaintroub, Wladislav wrote:
> Is there a general rule about SCSU-comparison? Or it will always depend on
> SCSU-implementation?

The only comparison of SCSU-compressed strings that you might expect
to be accurate is a comparison of two strings compressed using the
same SCSU implementation using the same settings (assuming the
implementation has user-controllable settings).

SCSU defines sixteen 128-byte windows -- eight fixed and eight
dynamic. You can even position the dynamic windows to cover the
ranges of most of the static windows. This can be useful since most
static windows allow only "non-locking shifts" where, for each
character, you select the window and specify the offset in the window.
In contrast, the dynamic windows allow locking shifts, where you
select the window once and then use one byte per character to specify
offsets within the window (until you specify something else).

For the same input string, different SCSU implementations are going to
make different choices about where to position the windows. Even if
two implementations chose the exact same window offsets, they could
still use different number windows, thereby throwing off any binary
comparison of the compressed strings.

A compression program is likely to analyse the input data to determine
where and when to position the dynamic windows. Much like gzip does,
an SCSU compression program may offer a "compress better" mode that
does more analysis up front, which costs more time, for a more optimal
result. Consequently, it's possible that a single implementation
could produce different results for the same input based on the
"compress better" setting.


Tony Graham
Tony Graham
Mulberry Technologies, Inc.
17 West Jefferson Street Direct Phone: 301/315-9632
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
  Mulberry Technologies: A Consultancy Specializing in SGML and XML

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT