RE: UTF-8N?

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Thu Jun 22 2000 - 12:15:17 EDT

Next message: Robert A. Rosenberg: "RE: How to distinguish UTF-8 from Latin-* ?"
Previous message: Gary L. Wade: "UTF-8 BOM Nonsense"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: John Cowan: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>
> On 06/22/2000 02:24:49 AM <Antoine.Leca@renault.fr> wrote:
>
> >It was my understanding that U+FEFF when received as first character
> should be
> >seen as BOM and not as a character, and handled accordingly.
>
> When the encoding scheme is known to be UTF-16BE or UTF-16LE,
> it *must not*
> be interpreted as a BOM. When the encoding scheme is known to
> be UTF-16
> (i.e. byte order is unknown), then it *must* be interpreted
> as a BOM. But
> in the case of UTF-8, there is no requirement either way, and so it is
> ambiguous: you don't know if it's supposed to be a BOM or
> ZWNBSP (unlikely
> as an initial character, but valid).
>
>
> Peter Constable
>

Am I reading this wrong? Here's what I get:

I hand you a UTF-16 document. This document is:

FE FF 00 48 00 65 00 6C 00 6C 00 6F

..so it says "Hello". Then I say, "Oh, by the way, that's
big-endian." *POOF* The content of the document has changed, and there is
now a 'ZERO WIDTH NO BREAK SPACE' at the beginning. Smells pretty skunky...

BTW, what is a ZWNBSP anyway? From here it seems like a
non-character. Is there an actual use for it? Some of the things I've read
here imply that there is; if someone would be so kind as to elucidate, I'd
appreciate it.

/|/|ike

Next message: Robert A. Rosenberg: "RE: How to distinguish UTF-8 from Latin-* ?"
Previous message: Gary L. Wade: "UTF-8 BOM Nonsense"
Maybe in reply to: Masahiko Maedera: "UTF-8N?"
Next in thread: John Cowan: "Re: UTF-8N?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT