Re: Unicode & space in programming & l10n

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Sep 21 2006 - 08:34:13 CDT

Next message: Philippe Verdy: "Re: Unicode & space in programming & l10n"

Previous message: Hans Aberg: "Re: Unicode & space in programming & l10n"
In reply to: Hans Aberg: "Re: Unicode & space in programming & l10n"
Next in thread: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hans Aberg <haberg at math dot su dot se> wrote:

> Another method, which enables compressing both characters (code
> points) and natural language words (sequences of code points), might
> be to make modified UTF-8, where the leading byte admits indicating
> two categories of numbers. (Continued below.)

Whatever you do, do NOT call it "UTF-anything."

I'm currently compressing names in the Unicode character list using a
variable-length byte-based scheme that encodes common words like LETTER
in 1 byte and rare words like SPATHI in two bytes. The range of trail
bytes is allowed to overlap the range of lead bytes, since backward
parsing doesn't matter for this specific application. It has some
characteristics in common with UTFs, but it isn't a UTF and I pledge not
to call it one.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
RFC 4645  *  UTN #14

Next message: Philippe Verdy: "Re: Unicode & space in programming & l10n"
Previous message: Hans Aberg: "Re: Unicode & space in programming & l10n"
In reply to: Hans Aberg: "Re: Unicode & space in programming & l10n"
Next in thread: Hans Aberg: "Re: Unicode & space in programming & l10n"
Reply: Hans Aberg: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 08:36:13 CDT