RE: Unicode and end users

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Fri Feb 15 2002 - 12:47:54 EST


Thanks for the link!

The section on the BOM, written by Mark Davis, President of the Unicode
Consortium, gives no indication that a UTF-8 stream should never have a BOM.

I quote:

Q: When a BOM is used, is it only in 16-bit Unicode text?

A: No, a BOM can be used as a signature no matter how the Unicode text is
transformed: UTF-16, UTF-8, UTF-7, etc.

...and...

Q: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)?

A: Yes, UTF-8 can contain a BOM.

...and...

Q: Why wouldn't I always use a protocol that requires a BOM?

A: Where the data is typed, such as a field in a database, a BOM is
unnecessary.

If there is a file on disc called foo.txt, it is clearly not typed data.
Thus, it appears to be Mr Davis' opinion that when such a file contains
UTF-8 data, it is quite appropriate for there to be a BOM at the start.

If Mr Freytag still disagrees, I hope he will explain why.

Thanks!

- rick cameron

-----Original Message-----
From: Tom Gewecke [mailto:tom@bluesky.org]
Sent: Thursday, 14 February 2002 20:42
To: unicode@unicode.org
Subject: RE: Unicode and end users

>Can you please expand on your statement that UTF-8 should never have a
>BOM? Having one makes it very easy to distinguish a text file that
>contains UTF-8 from one that contains text in the system default MBCS
>encoding.
>
>You may not be surprised to learn that Microsoft (or, at least, one of
>its
>programmers) does not agree with you. When I save a file from Notepad on
>Windows XP in UTF-8, the file contains a BOM.

It seems there are quite a few answers to these questions in the Unicode
utf-bom faq

http://www.unicode.org/unicode/faq/utf_bom.html

including mention of the Microsoft case and the fact that generally a BOM
can be used with any UTF.



This archive was generated by hypermail 2.1.2 : Fri Feb 15 2002 - 12:15:37 EST