RE: Unicode, SMS and year 2012

From: Doug Ewell <doug_at_ewellic.org>
Date: Fri, 27 Apr 2012 13:25:38 -0700

Mark Davis 🍍 <mark at macchiato dot com> wrote:

> Actually, if the goal is to get as many characters in as possible,
> Punycode might be the best solution. That is the encoding used for
> internationalized domains. In that form, it uses a smaller number of
> bytes per character, but a parameterization allows use of all byte
> values.

That might work well if the goal is to find a compact encoding to 7-bit
code units, then express 8 such code units in 7 bytes. It would
certainly be more economical than UTF-7-over-7, which is fine for ASCII
and awful for anything else.

I don't usually think of Punycode as an ideal general-purpose
compression encoding, especially with lines of arbitrary length or
consisting primarily of non-ASCII content (Cristian's example), but it's
certainly worth experimenting. One advantage might be that encoders and
decoders for Punycode already exist, probably in greater numbers than
for SCSU.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­
Received on Fri Apr 27 2012 - 15:29:41 CDT

This archive was generated by hypermail 2.2.0 : Fri Apr 27 2012 - 15:29:54 CDT