From: Doug Ewell (dewell@adelphia.net)
Date: Mon May 29 2006 - 12:08:10 CDT
Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:
> I see there's also an optional compressed mode in GSM, which should
> work wonders for most language-specific alphabetic scripts.
Can you provide a reference to that compressed mode? I couldn't find it
on the page Cristian mentioned, and a very casual search for "GSM
character set compressed" led me to several descriptions of the standard
7-bit ASCII-based encoding, and a proposal paper that achieves 5 bits
per character by splitting uppercase and lowercase ASCII letters into
"groups" (similar to PTTC 50 years ago) and allowing only 5
non-alphabetic characters.
> It's a bit clunky when having to switch between Unicode rows (i.e.
> high 8 bits of UTF-16), taking at least 9 bits for each switch, which
> might make it useless for Cree or possibly even for Vietnamese.
Not surprisingly, Inuktitut and Vietnamese are the two writing systems I
mentioned in UTN #14 as not being compressed well by SCSU, due to its
128-byte window-switching approach and the dispersal of the characters
in those writing systems across multiple windows. A system that uses
256-byte windows (rows) instead of 128 would have the same problem,
probably even worse for Vietnamese.
-- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Mon May 29 2006 - 12:26:43 CDT