Re: PDUTR #26 posted

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Sep 17 2001 - 07:20:38 EDT


From: "Marco Cimarosti" <marco.cimarosti@essetre.it>

> Does renaming "UTF-8S" to "CESU-8" fix all the issues that were
> discussed on this mailing list at the beginning of last spring?

In my opinion (and the opinion of some others), no. But they do represent
the *attempt* to answer them.

> Specifically:
>
> - How will it be ensured that UTF-8 and CESU-8 (former UTF-8S)
> will not be mixed up in the same environment?

The hope is that since there would be two separate formats, each with only
ONE way to handle supplementary characters, that there is a "simple" way to
distinguish them.

> How should an UTF-8 application behave if it accidentally receives
> a CESU-8 surrogate sequence? How does an application which
> relies on CESU-8 binary sorting behave if it accidentally receives an
> UTF-8 4-byte sequence?

Both should error out. In practice, I wonder how common it would be and
because of this how many people will actually do THAT in their parsers. I
expect lots of non-compliant parsers.

> - What is the need for an official document that describes "an alternate
> encoding to UTF-8 for internal use"? Lots of applications implement some
> sort of internal hacks, but they don't issue UTF's to tell the world about
> it.

You've got me. I'd rather not see it in any document that contains the word
"Unicode." Personally, I think there already is an answer for *those*
folks -- prior versions of Unicode.

MichKa

Michael Kaplan
(principal developer of the MSLU)
Trigeminal Software, Inc. -- http://www.trigeminal.com/
the book -- http://www.i18nWithVB.com/



This archive was generated by hypermail 2.1.2 : Mon Sep 17 2001 - 05:56:12 EDT