Re: UTF-16 problems

From: toby_phipps@peoplesoft.com
Date: Mon Jun 11 2001 - 23:40:41 EDT


Jianping Yang <Jianping.Yang@oracle.com> wrote:
>>So far, I can claim that only Oracle provides fully UTF-8 and
>> UTF-16 support for RDBMS, but unfortunately, as we cannot change the
exiting
>> utf8 definition from Oracle 8i as backward compatibility, we have to use
a new
>> character set name for it as AL32UTF8.

Michael (mitchka) Kaplan <mitchka@trigeminal.com> wrote:
>As many have pointed out, THIS will cause more confusion than just about
>anything else. Tex is the only one who said anything but he is not the
only
>one to believe you are seriously undermining the standard with this
>decision. It certainly does a lot to hurt interoperability.

Yes, it will cause confusion, however stability, and 100% backwards
compatibility is an overriding concern. I'd choose a little confusion
anytime if given the choice between confusion and breaking products that
depend on you.

Just like systems build dependence on UCD character names, users of
database systems build dependence on vendor naming conventions. Changing
core API name references is not something that any responsible vendor would
do without overwhelming support from their customer base, and since the
database character set is chosen once per database installation, and is not
visible to the average user, I see no overwhelming reason for Oracle to
change this. I admit, it is confusing at first, however they do have it
well documented (and I can only assume it will be documented with even
greater clarity in their 9i release where many additional Unicode features
have been added), and they also support the true, correct UTF-8 definition
as per ISO 10646 and TUS 3.0.

I equate this issue identically to the Unicode Consortium's refusal to
change UCD names even when they are incredibly misleading, as is the case
with U+20A0 EURO CURRENCY SIGN. This is obviously not the "Euro currency
sign" regardless of its name. The description points to the appropriate
character for the real sign. Oracle's had to do the same thing with their
UTF8 character set to ensure backwards compatibilty and stability - leave
it as-is, but document very clearly that it may not be what the user
expects, and points them to an alternative character set setting
(AL32UTF8).

Toby.



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT