Playing with Unicode (was: Re: UTF-17)

From: DougEwell2@cs.com
Date: Sat Jun 23 2001 - 14:40:22 EDT


In a message dated 2001-06-22 23:08:11 Pacific Daylight Time,
michka@trigeminal.com writes:

> > Oh yeah, well, I can be more tongue-in-cheek than all of you. I've
> > already implemented it.
>
> Doug, this is one of those things one should be ashamed of, like believing
> in the April Fool's Day message about "self serve encodings" enough to have
> put together a proposal to send out for it!

I'm never ashamed of perfectly good code I've written to fulfill a humorous
requirement. I'm only ashamed of badly written code, or code that implements
a bad idea that someone else thinks is a good idea.

And I'm certainly not ashamed of having played along with Sarasvati's April
Fool's joke about proposing your own self-serve encoding by submitting LATIN
SMALL LIGATURE CT. (BTW, did anyone notice the irony in that response? The
Unicode Consortium had already said they wouldn't encode any more
compatibility ligatures, even when requested through the normal channels...
although now I wonder if they would be swayed by large, influential companies
applying a lot of pressure.)

But I do enjoy playing with Unicode, experimenting with new UTFs and TESs and
comparing them with others, partly because I believe it has pedagogical
value. It teaches me something about Unicode. For example, I have learned a
lot about conformance criteria and definitions as a result of studying
transformation formats that do and do not meet the criteria. This is always
harmless, and sometimes useful.

There seem to be a lot of people playing with Unicode lately, and a lot of
other people who don't get the joke or are having trouble telling the serious
proposals from the whimsical ones. This doesn't mean these people are
humorless, just that the line between serious and whimsical is getting
blurred. Maybe, to some, Unicode doesn't seem a likely source of humor.

Someone just suggested to Ken Whistler, in response to his UTF-17 posting,
that the world already had enough UTFs and didn't need any more, as if Ken
(of all people) didn't already know that.

I have generally avoided littering my posts, on the Unicode list and
elsewhere, with :-) and <g> and friends, assuming that the humor was obvious
enough to be detected without such markup. On the Unicode list I have
sometimes used \u263a (a Unicode-related mini-joke all by itself), but still
sparingly.

To keep well-meaning people from misinterpreting humorous UTF proposals as
serious, while still allowing the levity to flow freely, I hereby propose
that UTFs proposed in a non-serious light be indicated in lower-case letters
(e.g. utf-64, utf-17) while the serious UTFs and proposals should remain in
upper-case (e.g. UTF-8, UTF-16).

UTF-8S is a serious proposal (even if many of us disagree as to its value),
so it should be written "UTF-8S". I henceforth pledge to stop writing
"UTF-8s" since the mixture of upper- and lower-case letters might dilute the
intent of this new UTF notation proposal.

BOCU is a serious technique, although not a formal technical spec, so
all-caps would be correct.

Doug's Unicode Compression Kludge, which I described in 1998, is not a
serious proposal, and so it should be abbreviated as "duck" rather than
"DUCK."

Hopefully this form of tagging will prevent people from worrying about joke
proposals contributing to the proliferation of UTFs, so we can get on with
the business of worrying about real proposals contributing to the
proliferation of UTFs.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT