Re: (Informational only: UTF-8 BOM and the real life)

From: Doug Ewell <doug_at_ewellic.org>
Date: Sat, 28 Jul 2012 09:52:24 -0600

Steven Atreju wrote:

> Once more i want to point out that on Unix/POSIX systems the file
> content can be seen as a whole, and i hope and think that this
> will not change. This situation is completely different than on
> Windows, which had textfiles with appended (separated by ^Z or so)
> meta information that was invisible in normal text editors already
> in the ninetees (or even earlier, but i don't know).

^Z as an EOF marker for text files was part of the MS-DOS legacy from
CP/M, where all files were written to a multiple of the disk block size
(I think 128 for CP/M and 512 for MS-DOS 1.x), and there had to be some
way to tell where the real text content ended. New stream-based I/O
calls in MS-DOS 2.0 made this mechanism unnecessary. Unix systems had no
legacy from CP/M, so they never had this problem.

> I.e., this is why we do have this messy text OR binary file I/O
> distinction like O_BINARY (for open(2)), "b" (for fopen(3)) or
> binmode (perl(1)). Because without those a text file will see
> End-Of-File at the ^Z, not at the real end of the file.

The reason for the text/binary distinction on DOS and Windows is
conversion between Unix-standard LF and Windows (DOS, CP/M)-standard
CRLF. It might be true that library calls to read a file in text mode
will stop at ^Z, but Notepad and Wordpad don't. I know the library
doesn't automatically write ^Z. Almost nobody in the MS world uses the
^Z convention on purpose any more; many don't even know about it.

> (Which rises the immediate question why the Microsoft programmers did
> not embed the meta information in this section at the end of the file.
> But i don't really want to know.)

See above. The intent of ^Z was never to distinguish data from metadata,
as with the Mac data and resource forks.

But of course none of this has anything to do with U+FEFF.

> So do the programmers have to face the same conditions? I don't
> really think so. They prefer driving plain text readers up the wall.
> Successfully.

Again, we don't really have this kind of evil intent, though it's often
fun and convenient for people to imagine we do.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­ 
Received on Sat Jul 28 2012 - 10:56:11 CDT

This archive was generated by hypermail 2.2.0 : Sat Jul 28 2012 - 10:56:13 CDT