On Fri, 27 Apr 2012 11:21:05 -0700
"Doug Ewell" <doug_at_ewellic.org> wrote:
> SCSU works equally well, or almost so, with any text sample where the
> non-ASCII characters fit into a single block of 128 code points. For
> anything other than Latin-1 you need one byte of overhead, to switch
> to another window, and for many scripts you need two, to define a
> window and switch to it. But again, two bytes is not what's holding
> anyone up.
With SCSU that avoids Unicode mode and UQU whenever possible, most
alphabetic languages work fairly well. However, extra windows are
needed to cover the half-blocks from A480 to ABFF, 15 new codes. If I
were being miserly, I wouldn't cover A500-A5FF.
SCSU doesn't work well with large syllabaries, especially if they
include a lot of unused characters within the half-blocks used. Inuit
suffers badly from this, but still achieves noticeable compression. I
experimented with compressing Yi transposed to a covered range, and
found that it achieved something like 10% compression. Yi suffers from
needing the 8 dynamic windows to be switched between 10 half-blocks
(with occasionally excursions to an 11th.) If the Yi characters had
been arranged by tone first and initial consonant second, 2 of the
half-blocks would never have been used in my sample!
Vai A500-A63F fits in 3 half-blocks, and I would expect non-Vai
characters in it to be in static blocks. Given how well Yi performed, I
expect Vai to benefit from SCSU.
Has anyone investigated the performance of SCSU with Cuneiform or
Egyptian Hieroglyphics? It might achieve better than 50% compression!
A fair comparison of Egyptian Hieroglyphics depends on the mark-up
used, for Unicode on its own does not enable one to write reasonable Middle
Egyptian.
Richard.
Received on Sat Apr 28 2012 - 13:00:51 CDT
This archive was generated by hypermail 2.2.0 : Sat Apr 28 2012 - 13:00:54 CDT