Re: Unicode forms for internal storage - BOCU-1 speed

From: Doug Ewell (dewell@adelphia.net)
Date: Fri Jan 23 2004 - 01:20:06 EST

Next message: Doug Ewell: "Re: Unicode forms for internal storage - BOCU-1 speed"

Previous message: Kenneth Whistler: "Re: Unicode forms for internal storage - BOCU-1 speed"
In reply to: Kenneth Whistler: "Re: Unicode forms for internal storage - BOCU-1 speed"
Next in thread: Doug Ewell: "Re: Unicode forms for internal storage - BOCU-1 speed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kenneth Whistler <kenw at sybase dot com> wrote:

>> I have seen several other informal proposals for "UTF-*" forms/
>> schemes. All this is just confusive, and their authors should imagine
>> their own names for reference. What do you think of this idea?
>
> It is, indeed, "confusive". Some of us have deliberately contributed
> to the confusion with tongue-in-cheek additions. See my own
> UTF-17 (draft-whistler-utf17-00.txt). I would not object if
> henceforward people referred to that as KW-UTF-17, to avoid
> confusion. :-)

A couple of years ago I suggested calling these "XTF's," to distinguish
them from the official "UTF's."

I've added a bit to the confusivity, with ha-ha-only-serious schemes
called DUCK (Doug's Unicode Compression Kludge) and MUCK (Multigraph
Unicode Compression Kludge), plus something I called "dynamic code
pages" which never saw the light of day, and probably never will because
of their really, really bad performance.

But mostly I've carried other people's jokes (and serious proposals) to
the logical extreme and beyond, by creating fully functional and tested
implementations of:

• UTF-4 by Jill Ramonsky (name provided by John Cowan)
• UTF-5 by James Seng, Martin Dürst, and Tin Wee Tan
• UTF-7d5 by Jørg Knappen
• UTF-8C1 by Markus Scherer
• UTF-9 by Jerome Abela (not Mark Crispin's version)
• UTF-17 by Ken
• UTF-24 by Pim Blokland
• UTF-64 by Marco and Paul Keinänen
• UTF-mu by Marco
• UTF-Z by Marco
• XTF-3 by Shlomi Tal

as well as some more serious formats:

• UTF-1 (the "original" Unicode Transformation Format)
• UTF-EBCDIC (described in meticulous detail in UTR #16)

Currently I'm working on a much more useful project: a "clean-room"
encoder and decoder for BOCU-1, possibly the world's first that doesn't
just wrap the UTN #6 sample code.

And on the lighter side, I recently dredged up Misha Wolf's original
1995 description of RCSU, the predecessor of SCSU, and started
experimenting with an encoder and decoder:

http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML001/0242.htm
l

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Doug Ewell: "Re: Unicode forms for internal storage - BOCU-1 speed"
Previous message: Kenneth Whistler: "Re: Unicode forms for internal storage - BOCU-1 speed"
In reply to: Kenneth Whistler: "Re: Unicode forms for internal storage - BOCU-1 speed"
Next in thread: Doug Ewell: "Re: Unicode forms for internal storage - BOCU-1 speed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 23 2004 - 02:04:40 EST