RE: UTF-8 syntax

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Fri Jun 08 2001 - 16:20:07 EDT


Peter,

There is a standard Unicode sort order, the code point sort order. This
proposal calls for establishing and alternate code point order by
establishing a new set of encoding schemes.

We not only introduce a dual sorting scheme the new encoding forms that are
close enough to the standard ones that they will work 95% of the time. We
all know that this is a recipe for disaster. If they want a new encoding it
should be different enough that it does not work at all.

Anyone who has been in the IT field for any time should be able to see that
this idea is in invitation to disaster. We now have a little mess to clean
up but this proposal makes it a major mess.

Those who do not learn from history are doomed to repeat it. Look for
example at Unix. Many applications are written in Java just because it is
too much hassle to compensate for all of the differences. Fortunately some
ideas like Microsoft's version of Java have died. Lets hope that UTF-8s
also dies.

Carl

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Peter_Constable@sil.org
Sent: Thursday, June 07, 2001 10:52 PM
To: unicode@unicode.org
Subject: Re: UTF-8 syntax

On 06/07/2001 09:37:45 PM Peter Constable wrote:

>>So if you are saying there is ambiguous in
>>UTF-8S, it should also apply to UTF-16, which does not make sense to me.
>
>You know what? After all my harping, you're absolutely right on that
point.

I'm starting to wonder if I wasn't thinking this through enough and whether
I gave up too quickly. I'd need to think about it a little more.

The defintions have problems that need to be fixed, though, and they're
less clear for UTF-16 than they are for UTF-8. I'm becoming inclined to say
that any argumentation for or against UTF-8s on the basis of whether it
runs into problems with the defintions is a fruitless discussion at present
since it is trying to make logical deductions from defintions that are not
adequately clear, not adequately explict, and possibly also not internally
consistent.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT