SCSU question

From: Doug Ewell (
Date: Fri May 12 2000 - 10:28:45 EDT

"Vaintroub, Wladislav" <> wrote:

> If I compare (binary) 2 strings encoded in SCSU ,will the result be
> the same as if I compare corresponding Unicode strings ?
> As far as I understand, this should work for some general scripts
> ( Latin,Greek or Cyrillic and so on).
> What about text with mixed scripts?
> Is there a general rule about SCSU-comparison? Or it will always
> depend on SCSU-implementation?

No, SCSU strings cannot be compared directly, because different encoders
may generate different encoded forms of the same Unicode string. Even
if two encoders use the same general approach, they may use different
windows, so the tags would be different.

The only exception would be a pure-ASCII string, which any reasonable
SCSU encoder would leave as pure ASCII. If there is enough non-ASCII,
non-Latin-1 text, then even Latin-1 cannot be counted on to be
represented in a uniform way, since an encoder could theoretically
redefine dynamic window 0 to some base value other than U+0080.

You would need to decode the SCSU in both strings and then compare them.

-Doug Ewell
 Fullerton, California

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT