Re: UTF-8 BOM (Re: Charset declaration in HTML)

From: Julian Bradfield <jcb+unicode_at_inf.ed.ac.uk>
Date: Tue, 17 Jul 2012 18:42:46 +0100

On 2012-07-16, Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
> I am also convinced that even Shell interpreters on Linux/Unix should
> recognize and accept the leading BOM before the hash/bang starting
> line (which is commonly used for filetype identification and runtime
> behavior), without claiming that they don"t know what to do to run the
> file or which shell interpreter to use.

Do you think they should also recognize and accept ISO-2022 escape
sequences before the hashbang? If not, why not?
The kernel doesn't know or care about character sets. It has a little
knowledge of ASCII (or possibly EBCDIC) hardwired, but otherwise it deals
with 8-bit bytes. It has no concept of "text file".
A file to be interpreted by a hashbang could in principle contain
arbitrary binary stuff, be that text in multiple encodings or just
binary data. That stuff belongs to the input to the interpreter, not
to the hashbang line: that line contains a filename which is not
intepreted in any extended charset.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Received on Tue Jul 17 2012 - 12:48:09 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 12:48:10 CDT