Re: Names for UTF-8 with and without BOM

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Nov 01 2002 - 17:28:28 EST

Next message: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"

Previous message: Michael Everson: "Re: ct, fj and blackletter ligatures"
In reply to: Murray Sargent: "RE: Names for UTF-8 with and without BOM"
Next in thread: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"
Reply: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"
Reply: Doug Ewell: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

That is not sufficient. The first three bytes could represent a real content
character, ZWNBSP or they could be a BOM. The label doesn't tell you.

This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE 0xFF
represents a BOM, and is not part of the content. In the second case, it
does *not* represent a BOM -- it represents a ZWNBSP, and must not be
stripped. The difference here is that the encoding name tells you exactly
what the situation is.

Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄

----- Original Message -----
From: "Murray Sargent" <murrays@exchange.microsoft.com>
To: "Joseph Boyle" <Boyle@siebel.com>
Cc: <unicode@unicode.org>
Sent: Friday, November 01, 2002 12:42
Subject: RE: Names for UTF-8 with and without BOM

> Joseph Boyle says: "It would be useful to have official names to
> distinguish UTF-8 with and without BOM."
>
> To see if a UTF-8 file has no BOM, you can just look at the first three
> bytes. Is this a problem? Typically when you care about a file's
> encoding form, you plan to read the file.
>
> Thanks
> Murray
>
>
>

Next message: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"
Previous message: Michael Everson: "Re: ct, fj and blackletter ligatures"
In reply to: Murray Sargent: "RE: Names for UTF-8 with and without BOM"
Next in thread: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"
Reply: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"
Reply: Doug Ewell: "Re: Names for UTF-8 with and without BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 07:18:46 EST