RE: UTF-8S (was: Re: ISO vs Unicode UTF-8)

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Jun 04 2001 - 12:43:37 EDT


Michael (michka) Kaplan wrote:
> FWIW, the proponents of WTF-8 (my favorite name, to date!) answer Doug's

No, please, let's not make waters more muddied than they already are. Let's
keep on calling Oracle's proposal "UTF-8S", as there is no point in finding
a cuter name for it.

Particularly, this monster does not deserve to steal the name of Markus
Scherer's exquisite provocation
(http://www.mindspring.com/~markus.scherer/unicode/wcode.html#wtf-8).

> point in the proposal, by claiming that people should detect that it is
> UTF-8. After all, it is not illegal for someone reading a UTF-8 file to
> accept 6-byte supplementary characters, only illegal to emit them. Their
> argument is that it will not hurt current implementations to never detect
or
> understand that it is indeed WTF-8 since they will still be able to read
the
> text.

Wrong point! Perhaps it will not hurt applications which read text from
UTF-8 files and store it in UTF-16 strings.

But it WILL break applications which read text from UTF-8 files and store it
in UTF-*32* strings.

Just to name one, this would break all the effort that is being carried out
to bring UTF-8 + UTF-32 in *Linux*.

In UTF-32, 0x00020000 and 0x0000D840, 0x0000DC00 are NOT the same thing. To
make them become the same thing, it would requires complications to the
algorithm that defeat the very reason of using UTF-32.

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT