Re: Unicode and end users

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Sun Feb 17 2002 - 23:27:57 EST


-----BEGIN PGP SIGNED MESSAGE-----

Lars Kristan wrote:
> Doug Ewell wrote:
> > fine (as are LF->CRLF, stripped BOM's, and maybe even some edge cases
> > like converting between tabs and spaces). If there are any
> > security or spoofing concerns, it's best to leave everything completely
> > untouched.
>
> I see this as a good reason for NOT using BOM in UTF-8 files. CRLF is a
> major nuisance that many Windows programmers need to deal with. It requires
> text vs. binary mode when opening the files, plus size of the file does not
> match the number of characters written or read. UNIX programs usually don't
> need to bother with all that.

Text files in a known charset should always be opened in binary mode
(that is, what the C stdio API refers to as binary mode). The sets of valid
character sequences that must be accepted or generated for newline are
defined by the file format, *not* by the platform. When designing a new
file format, see UAX#13.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBPHB9SzkCAxeYt5gVAQEWVQf+JTx46Df4saTu9p3S/gjr+WOAf+h/cV1t
FyLy0SQA+timqut9POdkpJsF/d+w6YO3wYj/qdUvfLOO7ftBGmQpKZ6ibZ/yR5D1
JpF7F3HENsRSKeOTN68jU6vbb4f/qXoKWP5dEoy1tIfLbb5RJ5pSJA5jvDfN35aO
qfguwm3qfj2HnjTx1/PNIN1BdD9N2z2yl/Hg+kqGOlgPSUwKnH84JbxTupK87S4B
sI+x4QLSZG9sV8qaNpNOprzCVmsPinVLoXzUbmieExFFyBuj9avBoke+S04zPGKy
Fd/B5ycUM6YCFxLI9iu30E7OxcPDIomTxnnL15kuvh2WGZRZ3Itp/Q==
=+69v
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Tue Feb 19 2002 - 02:49:49 EST