[unicode] Re: UCS-2 Files

From: mbrown@corp.webb.net
Date: Thu Mar 22 2001 - 17:19:01 EST


> > When is a byte not eight bits?
>
> The Web version of the Oxford English Dictionary
> (http://dictionary.oed.com)
> says a byte is always eight bits:

Well, just my cursory research shows that to be an overstatement.

http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=byte says:

    A byte may be 9 bits on 36-bit computers. Some older
    architectures used "byte" for quantities of 6 or 7
    bits, and the PDP-10 and IBM 7030 supported "bytes"
    that were actually bit-fields of 1 to 36 (or 64)
    bits! These usages are now obsolete [...]

However, it is not difficult to find character encodings that are defined in terms of, or that refer to, 7-bit "bytes" -- ASCII [1] and ISO-2022-JP [2] being examples thereof.

ISO 2022 [3] in fact defines a byte as "a bit string that is operated upon as a unit" and goes on to say "A graphic character shall have a coded representation comprising one or more 8-bit combinations (bytes) in an 8-bit code, and one or more 7-bit combinations (bytes) in a 7-bit code. Within a coded graphic character set each character shall be represented by the same number of such bit combinations."

So you can see that "octets" is the preferable term when referring to units comprised of exactly 8 bits.

 [1] ftp://ftp.ecma.ch/ecma-st/Ecma-006.pdf (close enough)
 [2] http://www.faqs.org/rfcs/rfc1468.html
 [3] ftp://ftp.ecma.ch/ecma-st/Ecma-035.pdf (close enough)



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:15 EDT