Re: UTN #31 and direct compression of code points

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 07 2007 - 11:51:48 CDT

  • Next message: Richard Wordingham: "Re: UTN #31 and direct compression of code points"

    Philippe Verdy wrote on Monday, May 07, 2007 8:53 AM
    Subject: RE: UTN #31 and direct compression of code points

    > In fact I am a bit puzzled by the comment on the second line of the sample
    > code below:
    > length = DecodeLength(&input);
    > offset = DecodeOffset(&input); // same algorithm as DecodeLength

    > For encoding the offset (in matches only), what is the use of bits 7 and
    > 6?
    > Couldn't we store up to 7 bits of the offset value (instead of 6 bits) in
    > the same byte without requiring an extra byte?
    >
    > If so, the two functions DecodeLength() and DecodeOffset() need to be
    > different.

    Perhaps the gain is small.

    > However I wonder if the choice of the fixed size little-endian 16-bit
    > format
    > for the first character in a literal is appropriate. Why couldn't it
    > represented like a code points difference as used in the rest of the
    > literal?

    The algorithm given is clearly for compressing UTF-16 data. Look at the
    sign test for three byte difference values. (It could be adjusted/corrected
    to handle arbitrary codepoint differences.) I wonder if SCSU would
    out-perform the algorithm on, say, Shavian.

    Richard.



    This archive was generated by hypermail 2.1.5 : Mon May 07 2007 - 11:53:16 CDT