From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 29 2006 - 09:51:24 CDT
Doug Ewell wrote on Monday, May 29, 2006 at 1:13 AM
> I guess I shouldn't have said "much more efficiently" since GSM does use a
> 7-bit byte. You'd have to average one 2-byte character for every six
> 1-byte characters to break even, which generally isn't true for (e.g.)
> Spanish or German. Still, for Romanian and any other text where
> everything falls back to 2 bytes, SCSU would be a clear win.
I see there's also an optional compressed mode in GSM, which should work
wonders for most language-specific alphabetic scripts. It's a bit clunky
when having to switch between Unicode rows (i.e. high 8 bits of UTF-16),
taking at least 9 bits for each switch, which might make it useless for Cree
or possibly even for Vietnamese.
Richard.
This archive was generated by hypermail 2.1.5 : Mon May 29 2006 - 10:07:46 CDT