Re: Last Call: UTF-16, an encoding of ISO 10646 to Informational

From: Ned Freed (Ned.Freed@innosoft.com)
Date: Mon Aug 16 1999 - 00:12:34 EDT


> > > The idea of MIME is that the sender can tell the receiver what he has used.
> > > Nothing else.

> > Then it's a bad idea. Because it has led to the insupportable situation we
> > have now with email.

> The alternative is worse: people will still send proprietary stuff,
> (and some vendors will still build MUAs that favor use of their
> proprietary data formats, to try to make their competitors' products
> look bad), but they'll either do so without accurate labelling
> (e.g. mislabelling vendor-proprietary-codepage as iso-8859-1)
> or without precise labelling (e.g. labelling everything as
> application/octet-stream and expecting recipients to guess the
> content-type based on the filename suffix). In fact, both of these
> have been done, the products are widely deployed, and those products
> cause interoperability problems.

> Furthermore, the strategy you suggest has been tried, and it failed.
> Very early in the lifetime of MIME we tried to restrict content-type
> definitions to standard or interchange types, and we quickly found
> out that it didn't fly - people would either mislabel types, or
> they would make up their own type names without registering them.

Thank you very much Keith for essentially writing my response for me. The fact
of the matter is we somewhat unwittingly generated test cases for various
different registration strategies in MIME. And it has now been long enough for
us to evaluate the results of these different procedures.

With transfer encodings we put the bar as high as we could -- standards track.
And we've had ongoing interoperability problems. To be fair, the specific bane
of the world of transfer encodings -- UUENCODE -- is mostly problematic in that
there are a large number of slightly inconsisent variants in actual use. But
some of the interoperability problems have been the result of there not being a
standard label for UUENCODE. This was a mistake.

For media types we originally required format information to register. And what
we ended up with was a disaster that is only now getting cleaned up. In
retrospect we were fools to suppose that we could limit in any way the
diversity of media types people will use. The result has been inconsistant use
of nonstandard labels. Check out any real world application that supports MIME
and you'll find it is chock full of nonstandard media types. Our new
registration process is much better and it seems to be working.

And then there's charsets. We tried a dual approach with charsets, where
we allowed registration of anything that's reasonably consistent and well
defined, but limited our endorsement for actual use to a small subset
of the registration space.

And this has worked reasonably well. The initial registration process was
pretty sloppy and still needs to be cleaned up, and we've had some problems
getting some charsets registered, but by and large we've managed to keep
nonstandard labels from proliferating and we've managed to migrate a fair
amount of use to the endorsed subset.

> At least when the labelling is accurate, the recipient has some chance
> of being able to do an appropriate conversion.

Darned right. It is usually easy to find definitions for pretty much anything
in actual use. But constrast this to the binary file that shows up as
application/octet-stream.

Now, I have absolutely no religion at all about where UTF-16 should be used,
and what variant or variants should be used where. But it is foolish to think
that by not approving labels we'll somehow keep people from using the other
variants some of the time. Been there, tried that, doesn't work.

                                Ned



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT