Murray Sargent wrote on 1998-10-16 00:23 UTC:
> Donald Page wrote:
> > The above attachment should contain all of the Minimum European Subset
> > encoded as UTF-8. I created it for my own testing, but feel free to use
> > it.
> Donald's UTF-8 file should begin with a UTF-8 BOM in order to identify it as
> a UTF-8 encoded file. The starting bytes should be 0xEF 0xBB 0xBF.
No. The MIME attachment should just contain the header line
Content-Type: text/plain; charset=UTF-8
as specified in RFC 2044, and then the receiving email client should
know how to activate the UTF-8 decoder and how to select an appropriate
font. Most developers of email clients still have to add a bit here to
get this running as it is supposed to work.
I do not like BOMs. The whole beauty of UTF-8 is that it is stateless,
and introducing Byte-Order-Marker-Hacks destroys this. What happens to
BOMs in a cut&paste context? It just creates a mess.
If you want to switch properly between different encodings, then use
established complete mechanisms like the MIME charset identifier or the
ISO 2022 ESC sequences. BOMs are just an ugly hack.
> These bytes are discarded when reading the file in and added when
> writing the file out.
I am not sure what exactly you mean, but I hope it is the following: If
you are working on an unfortunate platform that requires BOMs in all
UTF-8 files, then the email software on that platform should prefix the
BOM to a file whenever a MIME text/plain UTF-8 body part is saved into a
file. If a file starting with a BOM is attached to an email as a text/
plain file, then the BOM should be stripped of and the MIME
charset=UTF-8 header should be added.
Markus
-- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT