DUDE-8, a compression proposal

From: John Cowan (cowan@mercury.ccil.org)
Date: Fri Jun 22 2001 - 21:23:59 EDT


It's DUDE-8! It's quick! It's easy! It does it all!

The DUDE-8 algorithm compresses 21-bit Unicode Scalar Values into bytes.
It is based on the DUDE(-6) algorithm currently being proposed for
i18n of DNS names.

1. Let prev = 0.
2. For each Unicode Scalar value usv, let prev = prev xor USV.
3. Let the value of prev be expressed in 21 bits as xxxxxxxyyyyyyyzzzzzzz.
4. Encode prev in 3 bytes as 0xxxxxxx 0yyyyyyy 1zzzzzzz.
5. Emit all non-zero bytes.
6. Repeat until done.

The optional signature of DUDE-8 is 03 FD FF, which is how U+FEFF appears at the
beginning of a DUDE-8 compression stream.

DUDE-8 is simpler and simpler than SCSU, but doesn't allow recovery from
garbles or even partial random or backwards access.

-- 
John Cowan                                   cowan@ccil.org
One art/there is/no less/no more/All things/to do/with sparks/galore
	--Douglas Hofstadter



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT