Re: A UTF-8 based News Service

From: Daniel Yacob (unicode@abyssiniacybergateway.net)
Date: Fri Jul 13 2001 - 10:03:28 EDT


DougEwell2@cs.com wrote:
>
> As a test, I downloaded the first article on the page:
>
> http://unicode.ethiozena.net/Gazettas/Kibrit/Archives/1993/Hamle/05/Kibrit.051
> 193.sera.html
>
> The article, dated 1993-05-11, has the formidable title:
>

Yesterday in the Ethiopian calendar :) <insert favorite Y2K joke here>

> «p-t negaso gidada wedeTalyan kobelelu teblo yeteseraCew zegeba f`Sum Heset
> new» yeTalyan Embasi
>

Titles (in <title> markups) remain transliterated since a number of
browsers
that support UTF-8 viewing in the page display area do not in the
"title" area
of the browser's application window. Transliterated Ethiopic actually
fairs
better than UTF-8 since consonants can be a single byte, syllables 2
bytes
and diphthongs 3. On average a document might "compress" with
transliteration
down to 53%. Not so easy on the eyes though but useful as a last
resort.

>
> Encoded in UTF-8, the file was 1891 bytes long. Converted into SCSU, it
> dropped to 1121 bytes, which is 40% shorter than the UTF-8 version, better
> than UTF-16, and probably better than any existing legacy encoding for
> Ethiopic. SCSU is a Good Thing.

Sounds promising! How well does SCSU gzip?

/Daniel



This archive was generated by hypermail 2.1.2 : Fri Jul 13 2001 - 11:04:23 EDT