Re: UTF-8N?

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Thu Jun 22 2000 - 11:42:00 EDT

Next message: Arijit Upadhyay: "Re: Bengali: variants of same conjunct"
Previous message: Christopher John Fynn: "Re: UTF-8N?"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Ayers, Mike: "RE: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter_Constable@sil.org wrote:
>
> On 06/22/2000 02:24:49 AM <Antoine.Leca@renault.fr> wrote:
>
> >It was my understanding that U+FEFF when received as first character
> should be
> >seen as BOM and not as a character, and handled accordingly.
>
> When the encoding scheme is known to be UTF-16BE or UTF-16LE, it *must not*
> be interpreted as a BOM. When the encoding scheme is known to be UTF-16
> (i.e. byte order is unknown), then it *must* be interpreted as a BOM.

Thanks Peter.

Now I ask a slighty different question then. What is the name of the
encoding where the byte order is known (for example, any application
on an Intel machine that receive its data from the system, as opposed
as from the network or similar hazardous source), and where a
received BOM should be silently eaten up?

It is my understanding that the corresponding name on the other side
of the black hole is exactly UTF-8.

If I am right, the correct way to encode a initiating ZWNBSP in UTF-8
would then be code 0xEF 0xBB 0xBF 0xEF 0xBB 0xBF.

Which will leave open the case of having two ZWNBSP at the beginning
of the file (as you said, unlikely, but valid)...

Antoine

Next message: Arijit Upadhyay: "Re: Bengali: variants of same conjunct"
Previous message: Christopher John Fynn: "Re: UTF-8N?"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: Ayers, Mike: "RE: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT