Re: A UTF-8 based News Service

From: DougEwell2@cs.com
Date: Thu Jul 12 2001 - 11:46:22 EDT


In a message dated 2001-07-12 8:27:20 Pacific Daylight Time,
unicode@abyssiniacybergateway.net writes:

> As someone involved in the service I often wish there was some
> form of "compressed" Unicode encoding. The 3-byte penalty that
> Ethiopic bears under UTF-8 turns into higher bandwidth that web
> hosting services meter and charge for by the megabyte. For a
> popular site this soon makes UTF-8 a costly option to support.
>
> A system analagous to iso-8859-x whereby Ethiopic and other scripts
> in the 3 byte range could be shifted back into the 2 byte range
> might help (generally only English and Ethiopic is desired together).

Today is your lucky day. Check out Unicode Technical Standard #6, "A
Standard Compression Scheme for Unicode":

    http://www.unicode.org/unicode/reports/tr6/

SCSU uses 128-byte windows to compress small alphabetic scripts to almost 1
byte per character. Since Ethiopic occupies three 128-character half-blocks,
SCSU must use three windows and switch between them, but the overhead is
still much lower than UTF-8. In the worst case (each character belongs to a
different half-block than the one before), you will still use only 2 bytes
per character.

SCSU is fully supported by SC UniPad, a Unicode text editor that is currently
available for free. For more information, visit:

    http://www.unipad.org/

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Thu Jul 12 2001 - 12:44:50 EDT