Re: UTF-8N?

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Thu Jun 22 2000 - 11:42:00 EDT


Peter_Constable@sil.org wrote:
>
> On 06/22/2000 02:24:49 AM <Antoine.Leca@renault.fr> wrote:
>
> >It was my understanding that U+FEFF when received as first character
> should be
> >seen as BOM and not as a character, and handled accordingly.
>
> When the encoding scheme is known to be UTF-16BE or UTF-16LE, it *must not*
> be interpreted as a BOM. When the encoding scheme is known to be UTF-16
> (i.e. byte order is unknown), then it *must* be interpreted as a BOM.

Thanks Peter.

Now I ask a slighty different question then. What is the name of the
encoding where the byte order is known (for example, any application
on an Intel machine that receive its data from the system, as opposed
as from the network or similar hazardous source), and where a
received BOM should be silently eaten up?

It is my understanding that the corresponding name on the other side
of the black hole is exactly UTF-8.

If I am right, the correct way to encode a initiating ZWNBSP in UTF-8
would then be code 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF.

Which will leave open the case of having two ZWNBSP at the beginning
of the file (as you said, unlikely, but valid)...

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT