Re: UTN #31 and direct compression of code points

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 07 2007 - 11:51:48 CDT

Next message: Richard Wordingham: "Re: UTN #31 and direct compression of code points"

Previous message: Marion Gunn: "Re: Uppercase ß is coming? (U+1E9E)"
In reply to: Philippe Verdy: "RE: UTN #31 and direct compression of code points"
Next in thread: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Reply: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy wrote on Monday, May 07, 2007 8:53 AM
Subject: RE: UTN #31 and direct compression of code points

> In fact I am a bit puzzled by the comment on the second line of the sample
> code below:
> length = DecodeLength(&input);
> offset = DecodeOffset(&input); // same algorithm as DecodeLength

> For encoding the offset (in matches only), what is the use of bits 7 and
> 6?
> Couldn't we store up to 7 bits of the offset value (instead of 6 bits) in
> the same byte without requiring an extra byte?
>
> If so, the two functions DecodeLength() and DecodeOffset() need to be
> different.

Perhaps the gain is small.

> However I wonder if the choice of the fixed size little-endian 16-bit
> format
> for the first character in a literal is appropriate. Why couldn't it
> represented like a code points difference as used in the rest of the
> literal?

The algorithm given is clearly for compressing UTF-16 data. Look at the
sign test for three byte difference values. (It could be adjusted/corrected
to handle arbitrary codepoint differences.) I wonder if SCSU would
out-perform the algorithm on, say, Shavian.

Richard.

Next message: Richard Wordingham: "Re: UTN #31 and direct compression of code points"
Previous message: Marion Gunn: "Re: Uppercase ß is coming? (U+1E9E)"
In reply to: Philippe Verdy: "RE: UTN #31 and direct compression of code points"
Next in thread: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Reply: Doug Ewell: "Re: UTN #31 and direct compression of code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon May 07 2007 - 11:53:16 CDT