Re: CESU-8 vs UTF-8

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Sat Sep 15 2001 - 16:17:30 EDT


Carl, Doug,

The issues you and Doug brought up were vigorously discussed. For the
decision, all I can say is that not everyone voted for it (which will be a
matter of public record once the preliminary minutes are posted).

D> This section of the TR amazed me. In the Summary and
D> elsewhere, CESU-8 "is not intended nor recommended as
D> an encoding used for open information exchange," but by
D> the end of the document we learn that it will be registered
D> with the Internet Assigned Numbers Authority. I have
D> spelled out IANA for a reason, to highlight that it is a body
D> dealing with open information exchange over the Internet.

...

D> This completely refutes all of the "internal use only" claims
D> made in the rest of the document.

Yes, there are many such issues. This is, however, more of a side effect of
how much the document *changed* from the original document, based on
feedback.

Many people believe that any rule or law that makes no sense or cannot be
enforced weakens all other laws. I believe that publishing an inconsistent
document that would allow any reasonably intelligent reader to come to the
same conclusions as you did, and the standard itself would be weakened
thereby.

...

D> I suggest, as part of the Proposed Draft stage for this document,
D> that Section 4 be deleted and that IANA be informed that CESU-8
D> is intended as an internal encoding only and that they are explicitly
D> requested NOT to register it.

C> In actuality Section 4 neither adds not takes away from PDUTR #26.
C> They can either apply to IANA or not if Section 4 is included or not.
C> It is merely a notification that there is no intent to make CESU-8 a
C> private protocol.

The argument was put forward [unconvincingly, in my eyes] that the only way
to protect the situation from having some other vendor register it with IANA
would be to do so in a pre-emptive manner. I, however, work on the
assumption that IANA is not populated by morons and that they would be at
least willing to hear from the UTC on the inadvisabiity of supporting any
such encoding, no matter who presents it.

No guarantees of course (there never are any) but I am sure they would be
willing to consider the desire of the UTC to not further litter the playing
field?

C> PDUTR #26 should be rejected in its entirety. If it is truly a private
C> protocol as they claim it does not belong it any form in the Unicode
C> standard.

I concur.

The argument was made that it should be tied to the [orthagonal, in my eyes]
argument of tightening up Unicode 3.2's UTF-8 definition to disallow the
6-byte form. In my eyes, however, it is perfectly acceptable to claim that,
in order to be compliant with the Unicode 3.2 definiton of UTF-8 one must
not use the 6-byte form but that prior versions would allow one to accept it
(if they so desired).

Thus you can make one change without *requiring* the other.

Since the only clients who would emit CESU-8 (PeopleSoft, et. al) are doing
so privately, no UTR is needed for them to do so. And there is a [prior]
version of the standard that can accomodate them.

C> You may have heard about hijacking legislative bills. It is taking an
C> existing bill and amending it to change the entire text of the bill. I
C> think that we should hijack PDUTR #26 and replace it with UTF-17.
C>
C> In actuality we should hijack PDUTR #26 to modify TR27 to specify
C> that at a minimum, systems that support UTF-16 must provide code
C> point order support services. We should delete all references to
C> CESU-8 and reject the idea of adding CESU-8 to the standard.

I do not know that the former is required; but either way I agree that
CESU-8 (ne้ UTF8-S) should not be included even as a UTR.

However, it is not possible to "hijack" the current proposal as the author
does not wish this to happen... though I suppose you are welcome to try and
convince him? :-)

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



This archive was generated by hypermail 2.1.2 : Sat Sep 15 2001 - 23:54:45 EDT