RE: Unicode and end users

From: Lars Kristan (lars.kristan@hermes.si)
Date: Mon Feb 18 2002 - 05:15:21 EST


Doug Ewell wrote:
> fine (as are LF->CRLF, stripped BOM's, and maybe even some edge cases
> like converting between tabs and spaces). If there are any
> security or
> spoofing concerns, it's best to leave everything completely untouched.

I see this as a good reason for NOT using BOM in UTF-8 files. CRLF is a
major nuisance that many Windows programmers need to deal with. It requires
text vs. binary mode when opening the files, plus size of the file does not
match the number of characters written or read. UNIX programs usually don't
need to bother with all that.

Now, expecting that UNIX programs will need to deal with BOM's would
introduce a similar problem. One could say that they will need to anyway, in
order to read UTF-16 files. But I don't believe that will ever happen. UTF-8
is the perfect solution for UNIX and UTF-16 will be dealt with by converting
entire files, never processing them directly (as far as simple grep-like
programs are concerned).

Lars Kristan



This archive was generated by hypermail 2.1.2 : Mon Feb 18 2002 - 04:49:47 EST