Re: MS/Unix BOM FAQ again (small fix)

From: George W Gerrity (ggerrity@dragnet.com.au)
Date: Thu Apr 11 2002 - 21:38:55 EDT


This thread seems just about ended, and I don't want to be the person
to revive it, but there have been numerous related topics in the past
six months, and nothing in them answers the question that has been
nagging me.

The question is

"Considering the difficulty af actually getting access to a file in
such a manner that the 'endian-ness' of the computer architecture is
NOT transparent, why do we even need a byte-order mark?"

To expand on this, imagine there is a text file in some encoding on
some medium created by a little-endian machine (say a DEC Vax or a
Macintosh 68000), and it is to be accessed on a big-endian machine
(any Intel 8080 -- Pentium architecture). Unless the two CPUs are
sharing the same RAM in order to share the file data in that RAM, the
data will have to be accessed by reading some storage medium, such as
mag tape, floppy disc, hard disc, CD-ROM, etc, or by some file
transfer method on a network. _All_ of these accessing methods are
either bit-serial or byte-serial, transmitting the most significant
bit of the most significant byte first, and the little/big-endian
storage in the RAM receiving buffers is done correctly by the target
machine. True, the low-level programming in a portable OS such as
*NIX, say, has to take cognizance of endian-ness, but even that is
pretty sparse.

I acknowledge that the BOM _can_ be used to differentiate between
various encodings -- UTF-8, UTF-16, UTF-32, non-Unicode -- but then,
that has _nothing_ to do with byte order. Perhaps it should be
renamed?

Or am I missing something important?

George



This archive was generated by hypermail 2.1.2 : Thu Apr 11 2002 - 20:15:32 EDT