RE: UTF-8N?

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Thu Jun 22 2000 - 12:15:17 EDT


>
> On 06/22/2000 02:24:49 AM <Antoine.Leca@renault.fr> wrote:
>
> >It was my understanding that U+FEFF when received as first character
> should be
> >seen as BOM and not as a character, and handled accordingly.
>
> When the encoding scheme is known to be UTF-16BE or UTF-16LE,
> it *must not*
> be interpreted as a BOM. When the encoding scheme is known to
> be UTF-16
> (i.e. byte order is unknown), then it *must* be interpreted
> as a BOM. But
> in the case of UTF-8, there is no requirement either way, and so it is
> ambiguous: you don't know if it's supposed to be a BOM or
> ZWNBSP (unlikely
> as an initial character, but valid).
>
>
> Peter Constable
>

        Am I reading this wrong? Here's what I get:

        I hand you a UTF-16 document. This document is:

FE FF 00 48 00 65 00 6C 00 6C 00 6F

        ..so it says "Hello". Then I say, "Oh, by the way, that's
big-endian." *POOF* The content of the document has changed, and there is
now a 'ZERO WIDTH NO BREAK SPACE' at the beginning. Smells pretty skunky...

        BTW, what is a ZWNBSP anyway? From here it seems like a
non-character. Is there an actual use for it? Some of the things I've read
here imply that there is; if someone would be so kind as to elucidate, I'd
appreciate it.

/|/|ike



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT