Re: UTF-8S (was: Re: ISO vs Unicode UTF-8)

From: Peter_Constable@sil.org
Date: Tue Jun 05 2001 - 17:38:07 EDT


On 06/05/2001 04:12:59 PM "Mark Davis" wrote:

>I am not an advocate of UTF-8s -- I am just trying to dispell some of the
>noise here.

I realise that, and wasn't meaning to suggest that I think *you* are taking
the wrong position. I do appreciate the comments you have made, which have
been, it seems to me, the main comments in reponse to those in opposition.
I don't understand why the bona fide advocates aren't doing that.

I have some specific answers below, but in general:
>
>1. Strict means according to the Unicode definition. See tr27.

It nowhere defines "strict UTF-8". It merely says

- don't emit ill-formed sequences
- treat illegal sequences as an error
- don't interpret illegal sequences
- irregular sequences can't be used "for encoding any other information"

(What's odd about the last point is that it is nowhere stated what
information irregular sequences *are* considered to bear.)

It then goes on to define illegal, ill-formed and irregular. (Of course, as
author you know all this; I repeating for the benefit of others.) The only
thing in Unicode that can be said to be strict regarding UTF-8 are the
terms of C12, and those terms do not, it seems to me, make the distinction
you present in the samples page.

Of course, you are informally reflecting the distinction between regular
sequences and irregular sequences. Formally, though, the standard doesn't
distinguish between these other than to say that you can't generate the
latter or use them "for encoding any other information". It does *not*
identify them as error conditions; that is reserved for *illegal*
sequences, and the sequences in question are not illegal.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT