RE: Unicode and end users

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Thu Feb 14 2002 - 22:15:41 EST


Can you please expand on your statement that UTF-8 should never have a BOM?
Having one makes it very easy to distinguish a text file that contains UTF-8
from one that contains text in the system default MBCS encoding.

You may not be surprised to learn that Microsoft (or, at least, one of its
programmers) does not agree with you. When I save a file from Notepad on
Windows XP in UTF-8, the file contains a BOM.

(I have no connection with Microsoft - I'm just a programmer who has to
write code to import text files from time to time!)

Thanks

- rick cameron

-----Original Message-----
From: Asmus Freytag [mailto:asmusf@ix.netcom.com]
Sent: Thursday, 14 February 2002 17:46
To: Martin Kochanski; unicode@unicode.org
Subject: Re: Unicode and end users

At 09:22 AM 2/14/02 +0000, Martin Kochanski wrote:
>Are there, in fact, many circumstances in which it is necessary for an
>end
>user to create files that do *not* have a BOM at the beginning?

In principle this is a requirement for data being labelled *external to the
date* as being in either UTF-16BE or UTF-16LE (ditto for UTF-32). These
formats *must not* have a BOM.

However, it may be the case in practice that protocols in which documents
are labelled that way, don't accept separately edited documents, so this
may be moot.

UTF-8 should *never* contain the BOM.
A./



This archive was generated by hypermail 2.1.2 : Thu Feb 14 2002 - 21:56:58 EST