Re: MS/Unix BOM FAQ again (small fix)

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Apr 12 2002 - 12:28:06 EDT


George W Gerrity wrote:

> To expand on this, imagine there is a text file in some encoding on some
> medium created by a little-endian machine (say a DEC Vax or a Macintosh
> 68000), and it is to be accessed on a big-endian machine (any Intel 8080
> -- Pentium architecture). Unless the two CPUs are sharing the same RAM

(Doug set the endiannesses straight.)

> in order to share the file data in that RAM, the data will have to be
> accessed by reading some storage medium, such as mag tape, floppy disc,
> hard disc, CD-ROM, etc, or by some file transfer method on a network.
> _All_ of these accessing methods are either bit-serial or byte-serial,
> transmitting the most significant bit of the most significant byte
> first, and the little/big-endian storage in the RAM receiving buffers is
> done correctly by the target machine. True, the low-level programming in

Well, no, the target machine cannot 'magically do it correctly', that's why this is an issue not only for Unicode but for all protocols and file formats that use 16-bit-and-larger units.
The source machine byte-serializes such units some way, and if there is no way to tell the byte order (by protocol, format definition, or flag in the byte stream) then the target machine may get garbage.

markus



This archive was generated by hypermail 2.1.2 : Fri Apr 12 2002 - 11:01:20 EDT