Re: Is the binaryness/textness of a data format a property?

From: Adam Borowski via Unicode <unicode_at_unicode.org>
Date: Fri, 20 Mar 2020 13:46:25 +0100

On Fri, Mar 20, 2020 at 12:21:26PM +0000, Costello, Roger L. via Unicode wrote:
> [Definition] Property: an attribute, quality, or characteristic of something.
>
> JPEG is a binary data format.
> CSV is a text data format.
>
> Question #1: Is the binaryness/textness of a data format a property?
>
> Question #2: If the answer to Question #1 is yes, then what is the name of
> this binaryness/textness property?

I'm afraid this question is too fuzzy to have a proper answer.

For example, most Unix-heads will tell you that UTF16LE is a binary rather
than text format. Microsoft employees and some members of this list will
disagree.

Then you have Postscript -- nothing but basic ASCII, yet utterly unreadable
for a (sane) human.

If you want _my_ definition of a file being _technically_ text, it's:
* no bytes 0..31 other than newlines and tabs (even form feeds are out
  nowadays)
* correctly encoded for the expected charset (and nowadays, if that's not
  UTF-8 Unicode, you're doing it wrong)
* no invalid characters

But besides this narrow technical meaning -- is a Word document "text"?
And if it is, why not Powerpoint? This all falls apart.

Meow!

-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ in the beginning was the boot and root floppies and they were good.
⢿⡄⠘⠷⠚⠋⠀                                       -- <willmore> on #linux-sunxi
⠈⠳⣄⠀⠀⠀⠀
Received on Fri Mar 20 2020 - 07:46:40 CDT

This archive was generated by hypermail 2.2.0 : Fri Mar 20 2020 - 07:46:41 CDT