Thanks for the link!
The section on the BOM, written by Mark Davis, President of the Unicode
Consortium, gives no indication that a UTF-8 stream should never have a BOM.
I quote:
Q: When a BOM is used, is it only in 16-bit Unicode text?
A: No, a BOM can be used as a signature no matter how the Unicode text is
transformed: UTF-16, UTF-8, UTF-7, etc.
...and...
Q: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)?
A: Yes, UTF-8 can contain a BOM.
...and...
Q: Why wouldn't I always use a protocol that requires a BOM?
A: Where the data is typed, such as a field in a database, a BOM is
unnecessary.
If there is a file on disc called foo.txt, it is clearly not typed data.
Thus, it appears to be Mr Davis' opinion that when such a file contains
UTF-8 data, it is quite appropriate for there to be a BOM at the start.
If Mr Freytag still disagrees, I hope he will explain why.
Thanks!
- rick cameron
-----Original Message-----
From: Tom Gewecke [mailto:tom@bluesky.org]
Sent: Thursday, 14 February 2002 20:42
To: unicode@unicode.org
Subject: RE: Unicode and end users
>Can you please expand on your statement that UTF-8 should never have a
>BOM? Having one makes it very easy to distinguish a text file that
>contains UTF-8 from one that contains text in the system default MBCS
>encoding.
>
>You may not be surprised to learn that Microsoft (or, at least, one of
>its
>programmers) does not agree with you. When I save a file from Notepad on
>Windows XP in UTF-8, the file contains a BOM.
It seems there are quite a few answers to these questions in the Unicode
utf-bom faq
http://www.unicode.org/unicode/faq/utf_bom.html
including mention of the Microsoft case and the fact that generally a BOM
can be used with any UTF.
This archive was generated by hypermail 2.1.2 : Fri Feb 15 2002 - 12:15:37 EST