Re: And Visions of Sugar Plum UTF-8's Dance in Their Heads

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jun 12 2001 - 22:50:06 EDT


Jianping said:

> > What you finally stated today is that <F0 90 80 80> is flat-out
> > *illegal* in UTF-8s. That was a missing piece of the puzzle for anyone
> > trying to interpret what you are proposing.
> >
>
> In the UTF-8S, there should be no irregular forms, should we repeat the history again?
> Nobody except you though that 4-byte is allowed in UTF-8S.

False. Do I have to dig out chapter and verse from the email to
show you? Peter Constable certainly did -- and asked you about
it.

Given that UTF-8 already exists and will continue to exists and will be
confused with UTF-8s, it seems incumbent upon you and other proposers
of UTF-8s to produce a very clear specification of exactly how UTF-8
relates to the proposed UTF-8s.

So far, getting the detailed questions answered has been like pulling
teeth.

> > > That's also your perception but not Oracle as we already support standard UTF-8
> > > encoding in 9i.
> >
> > How is Oracle's support for standard UTF-8 relevant to the conceptual
> > definition of UTF-8s?
>
> That means we do recognize U-00010000 in our implementation for UTF formats.

How is Oracle's support for supplementary characters relevant to my
first question?

> >
> > Now please answer the question for UTF-32 under your formulation of
> > UTF-8s.
>
> My answer here is quite simple:
>
> The UTF-8S code unit sequence <ED A0 80 ED B0 80> *always* corresponds to U+10000.
> It also always corresponds to the UTF-32 code unit sequence <00010000>
> and the UTF-8 code unit sequence <F0 90 80 80>.
>
> No ambiguities, no mapping issues.

You have conveniently ignored again the question that Peter posed to you
days ago, and which I raised, explicitly in the k and l lines in the
comparisons derived from Mark's summary. What do you do with the following
sequence of code points:

<U-0000D800, U-0000DC00>

What is the UTF-8s and UTF-32 representation of that sequence, in your
analysis? And does it or does it not introduce an ambiguity of representation?

--Ken



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT