Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Jun 04 2001 - 12:00:01 EDT


From: <Peter_Constable@sil.org>
> On 06/04/2001 02:10:35 AM Doug Ewell wrote:

> >While we are at it, here's another argument against the existence of both
> >UTF-8 and this new UTF-8s. Recently there was a discussion about the use
> of
> >the U+FEFF signature in UTF-8 files, with a fair number of Unicode
experts
> >arguing against its necessity because UTF-8 is so easy to detect
> >heuristically. Without reopening that debate, it is worth noting that
> UTF-8s
> >could not be distinguished from UTF-8 by that technique...
>
> I hope some UTC members are listening to these arguments, particularly
some
> that weren't already strongly opposed to the UTF-8s proposal.

Have no fear on *that* one. :-)

FWIW, the proponents of WTF-8 (my favorite name, to date!) answer Doug's
point in the proposal, by claiming that people should detect that it is
UTF-8. After all, it is not illegal for someone reading a UTF-8 file to
accept 6-byte supplementary characters, only illegal to emit them. Their
argument is that it will not hurt current implementations to never detect or
understand that it is indeed WTF-8 since they will still be able to read the
text.

Clearly the only people who would care about it are the ones using it, so
the fact that it is "messed up" by others would thus be minimized?

Very circular argument, which of course further discourages the official
recognition of WTF-8 as an encoding form.

michka



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT