Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Tue Jun 05 2001 - 11:37:34 EDT


From: "Mark Davis" <markdavis34@home.com>
> From: "Marco Cimarosti" <marco.cimarosti@essetre.it>

> > But how should this 6-byte sequence be interpreted by a standard UTF-8
> > decoder? Does it become one or two code points?

> It is either one code point (lenient parser) or an error (strict parser).
It
> is never two.

This is, I think, the crux of the UTF-8S debate. If the above is acceptable
to SAP, Oracle, PeopleSoft, et. al., then no change needs to be made. If
they want either no error in "strict parsers" (such as XML parsers) or want
non-internal times that the sequence will be taken as two code points, then
they need to make the proposal -- and need that proposal to state the need
in such a way that it seems like a sensible thing to do rather than the UTC
inheriting the bad implementation decisions of others. :-)

Note that there has already been rather violent negative reaction from the
W3C side against the idea of supporting any such change here, whether the
UTC accepted a change or not. If it is the eventual goal of these people to
submit a UTC-approved UTF-8 variant then they should consider this fact as
they work to shore up the proposal.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT