Jianping said:
> > What you finally stated today is that <F0 90 80 80> is flat-out
> > *illegal* in UTF-8s. That was a missing piece of the puzzle for anyone
> > trying to interpret what you are proposing.
> >
>
> In the UTF-8S, there should be no irregular forms, should we repeat the history again?
> Nobody except you though that 4-byte is allowed in UTF-8S.
False. Do I have to dig out chapter and verse from the email to
show you? Peter Constable certainly did -- and asked you about
it.
Given that UTF-8 already exists and will continue to exists and will be
confused with UTF-8s, it seems incumbent upon you and other proposers
of UTF-8s to produce a very clear specification of exactly how UTF-8
relates to the proposed UTF-8s.
So far, getting the detailed questions answered has been like pulling
teeth.
> > > That's also your perception but not Oracle as we already support standard UTF-8
> > > encoding in 9i.
> >
> > How is Oracle's support for standard UTF-8 relevant to the conceptual
> > definition of UTF-8s?
>
> That means we do recognize U-00010000 in our implementation for UTF formats.
How is Oracle's support for supplementary characters relevant to my
first question?
> >
> > Now please answer the question for UTF-32 under your formulation of
> > UTF-8s.
>
> My answer here is quite simple:
>
> The UTF-8S code unit sequence <ED A0 80 ED B0 80> *always* corresponds to U+10000.
> It also always corresponds to the UTF-32 code unit sequence <00010000>
> and the UTF-8 code unit sequence <F0 90 80 80>.
>
> No ambiguities, no mapping issues.
You have conveniently ignored again the question that Peter posed to you
days ago, and which I raised, explicitly in the k and l lines in the
comparisons derived from Mark's summary. What do you do with the following
sequence of code points:
<U-0000D800, U-0000DC00>
What is the UTF-8s and UTF-32 representation of that sequence, in your
analysis? And does it or does it not introduce an ambiguity of representation?
--Ken
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT