UTF8 is not UTF-8 (was Re: UTF8 vs AL32UTF8)

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Sat Jun 09 2001 - 22:04:48 EDT


OK, the bottom line here is that Oracle goofed in implementing UTF-8,
and instead of fixing the mistake, either by renaming their
proprietary format or getting the data converted to the correct form,
they want to pass the error off on us, thus making things even worse.

I have a suggestion.

Say, "Oops!" loudly and publicly.

Name UTF-8 correctly in the next release, and rename UTF8 to
something that shows its proprietary nature.

Tell your customers that "UTF8" is not UTF-8, even if it sorts
quicker, and must be converted whenever it is exported.

Offer to convert any "UTF8" data to UTF-8 for free.

Drop the idea of getting a Unicode standard around your error. If you
and other database providers want to create such a standard, do it
yourselves, and jolly good luck to you! (You'll need it.)

I have an alternative suggestion.

We all know that the internal formats in a database are frequently
different from those presented to users. So you can keep your format
as long as you *always* convert it to UTF-8 externally. In this case
also you can leave us out of it.

At 3:56 PM -0700 6/8/01, Jianping Yang wrote:
>Carl,
>
>"Carl W. Brown" wrote:
> > Looking at your documentation you call UTF-8s UTF8 and standard UTF-8
> > AL31UTF8. To me this is very misleading.
>
>We clearly documented what character set definition for UTF8 and
>AL32UTF8 in our
>manual. If you look at them

Oh, you have customers who read and understand the documentation? Can
I have some of them?

>you should easy map UTF8 to UTF-8S and AL32UTF8 to UTF-8.

You refuse to abide by the Unicode standard, and you want *us* to
"fix" your problem?!? I don't think so.

It has been made plain on this list that you have technical answers
to your desiderata (which are *not* requirements). You claim that you
can save processing time by storing data in a non-standard format and
lying about it to your customers. I claim that any such savings will
be lost many times over due to errors in identifying the encodings
and to otherwise unnecessary conversions.

Now if you want to store data in your format, but always pass it
around in legal UTF-8, you get your internal performance benefit
without bugging any of us. I don't know of any way to handle
transfers between databases other than asking what the encodings are
at source and destination, and doing the conversion if necessary.
*You* can set up so that databases in your "UTF8" format can exchange
data directly, although I don't know why you would need to.

-- 

Edward Cherlin, Generalist "It isn't what you don't know that hurts you, it's what you know for certain that just ain't so."--Mark Twain, Josh Billings, Edwin Howard Armstrong, Will Rogers, Satchel Paige (following Thomas Jefferson)



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT