From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Nov 01 2002 - 17:28:28 EST
That is not sufficient. The first three bytes could represent a real content
character, ZWNBSP or they could be a BOM. The label doesn't tell you.
This is similar to UTF-16 CES vs UTF-16BE CES. In the first case, 0xFE 0xFF
represents a BOM, and is not part of the content. In the second case, it
does *not* represent a BOM -- it represents a ZWNBSP, and must not be
stripped. The difference here is that the encoding name tells you exactly
what the situation is.
Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄
----- Original Message -----
From: "Murray Sargent" <murrays@exchange.microsoft.com>
To: "Joseph Boyle" <Boyle@siebel.com>
Cc: <unicode@unicode.org>
Sent: Friday, November 01, 2002 12:42
Subject: RE: Names for UTF-8 with and without BOM
> Joseph Boyle says: "It would be useful to have official names to
> distinguish UTF-8 with and without BOM."
>
> To see if a UTF-8 file has no BOM, you can just look at the first three
> bytes. Is this a problem? Typically when you care about a file's
> encoding form, you plan to read the file.
>
> Thanks
> Murray
>
>
>
This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 07:18:46 EST