Re: UTF-5 specification

From: Doug Ewell (dewell@compuserve.com)
Date: Fri Mar 03 2000 - 10:21:32 EST


Bob Rosenberg wrote:

> You are not looking at the problem correctly.

Ken Whistler wrote:

> No wonder Doug was confused about how to implement an encoder/decoder
> for this.

and James Seng wrote (privately):

> However, I think you have some misunderstanding about UTF-5.

OK, I get the message... I'm confused! But I contend that the purpose of
a specification is to resolve confusions and ambiguities, not increase
them. If I am more confused AFTER reading a spec than I was BEFORE I
read it, something is wrong, and not just with me.

I have implemented encoders and decoders for UTF-8, UTF-7, UTF-7.5,
uuencode/uudecode, xxencode/xxdecode, base64, BinHex, atob/btoa, and
others. In each of these cases there was a specification (unofficial in
the case of BinHex) that told me exactly how to proceed. That is the
light up to which I am holding the UTF-5 specification.

From the responses, I gather that my original comment:

> It appears that UTF-5 was designed solely to allow non-ASCII characters
> in Internet domain names and e-mail addresses

was correct, and I suggest that the examples of "A<NOT IDENTICAL TO>
<ALPHA>." and "Hi Mom <WHITE SMILING FACE>!", which were purloined from
other UTF-* specifications, should be removed, as they lead the easily
confused reader to the conclusion that UTF-5 is suitable for general
purposes.

Instead, the authors might wish to expand on the intended use of UTF-5
for domain names and e-mail addresses, including a list of characters
that should be considered "delimiters" for creating these compound-UTF-5
hybrid strings. For example, '/' and ':' would be appropriate delimiters
as well as '@' and '.'.

On the side, Ken's explanation of the difference between character
encoding schemes and transfer encoding schemes was authoritative and
interesting as always, but it left me (oh dear) confused again: How is
UTF-5 different from UTF-7 in this regard? Ken wrote:

> TES's are things like base64, uuencode, BinHex, quoted-printable, etc.,
> that are designed to convert textual (or other) data into sequences of
> byte values that avoid particular values that would confuse one or more
> Internet or other transmission/storage protocols.

Gosh, that sounds like UTF-7 -- avoiding certain byte values that may not
be permissible in RFC 822 e-mail. What's the difference? Is UTF-7 not
a true UTF either by this definition?

Thanks for the feedback.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT