RE: Unicode and end users

From: Lars Kristan (lars.kristan@hermes.si)
Date: Mon Feb 18 2002 - 05:15:21 EST

Previous message: David Starner: "Re: UTF-8 was Re: Smiles, faces, etc"
Maybe in reply to: Martin Kochanski: "Unicode and end users"
Next in thread: David Hopwood: "Re: Unicode and end users"
Next in thread: Lars Kristan: "RE: Unicode and end users"
Reply: David Hopwood: "Re: Unicode and end users"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:
> fine (as are LF->CRLF, stripped BOM's, and maybe even some edge cases
> like converting between tabs and spaces). If there are any
> security or
> spoofing concerns, it's best to leave everything completely untouched.

I see this as a good reason for NOT using BOM in UTF-8 files. CRLF is a
major nuisance that many Windows programmers need to deal with. It requires
text vs. binary mode when opening the files, plus size of the file does not
match the number of characters written or read. UNIX programs usually don't
need to bother with all that.

Now, expecting that UNIX programs will need to deal with BOM's would
introduce a similar problem. One could say that they will need to anyway, in
order to read UTF-16 files. But I don't believe that will ever happen. UTF-8
is the perfect solution for UNIX and UTF-16 will be dealt with by converting
entire files, never processing them directly (as far as simple grep-like
programs are concerned).

Lars Kristan

Previous message: David Starner: "Re: UTF-8 was Re: Smiles, faces, etc"
Maybe in reply to: Martin Kochanski: "Unicode and end users"
Next in thread: David Hopwood: "Re: Unicode and end users"
Next in thread: Lars Kristan: "RE: Unicode and end users"
Reply: David Hopwood: "Re: Unicode and end users"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Feb 18 2002 - 04:49:47 EST