Re: Why do binary files contain text but text files don't contain binary?

From: Ken Whistler via Unicode <unicode_at_unicode.org>
Date: Fri, 21 Feb 2020 08:28:27 -0800

On 2/21/2020 7:53 AM, Costello, Roger L. via Unicode wrote:
>
> Text files may indeed contain binary (i.e., bytes that are not
> interpretable as characters). Namely, text files may contain newlines,
> tabs, and some other invisible things.
>
> Question: "characters" are defined as only the visible things, right?
>
No. You've gone astray right there. Please read Chapter 2 of the Unicode
Standard, and in particular, Section 2.4, Code Points and Characters:

https://www.unicode.org/versions/Unicode12.0.0/ch02.pdf#G25564

All of those types of characters can occur in Unicode plain text. (With
the exception of surrogate code points.)

> I conclude:
>
> Binary files may contain arbitrary text.
>
Binary files can contain *whatever*, including text.
>
> Text files may contain binary, but only a restricted set of binary.
>
The distinction is definitional. A text file contains *only* characters,
interpretable by a specific character encoding (usually Unicode, these
days).

But a text file need not be "plain text". An HTML file is an example of
a text file (it contains only a sequence of characters, whose identity
and interpretation is all clearly specified by looking them up in the
Unicode Standard), but it is not *plain* text. It is *rich* text,
consisting of markup tags interspersed with runs of plain text.

Another distinction that may be leading you astray is the distinction
between binary file transfer and text file transfer. If you are using
ftp, for example, you can specify use of binary file transfer, *even if*
the file you are transferring is actually a text file. That simply means
that the file transfer will agree to treat the entire file as a binary
blob and transfer it byte-for-byte intact. A text file transfer, on the
other hand, may look for "lines" in a text file and may adjust line
endings to suit the receiving platform conventions.

> Do you agree?
>
No.

--Ken
Received on Fri Feb 21 2020 - 10:28:52 CST

This archive was generated by hypermail 2.2.0 : Fri Feb 21 2020 - 10:28:52 CST