RE: Names for UTF-8 with and without BOM - pragmatic

From: Kent Karlsson (kentk@md.chalmers.se)
Date: Thu Nov 07 2002 - 14:41:31 EST

  • Next message: Kent Karlsson: "RE: ct, fj and blackletter ligatures"

    > Initial for each piece, as each is assumed to be a complete
    > text file before concatenation. Nothing
    > prevents copy/cp/cat and other commands from recognizing
    > Unicode signatures, for as long as they
    > don't claim to preserve initial U+FEFF.

    Yes there is, in a formal sense, for cat and cp. See
    http://www.opengroup.org/onlinepubs/007904975/utilities/cat.html
    which states "The standard output shall contain the sequence of
    *bytes* read from the input files. Nothing else shall be written
    to the standard output." (my emphasis) and
    http://www.opengroup.org/onlinepubs/007904975/utilities/cp.html
    which is not so explicit, but silently assumes that copying
    does not change the bytes of the file content in any way.

    cat, and copy/cp are very agnostic programs. They just copy
    (or concatenate) the byte strings, regardless of if the content
    is pictures, sound, or text. So 'cat' can "meaningfully"
    concatenate text files of the *same* encoding serialisation
    and *without* BOM/signature and where the text files are properly
    terminated (in the case of stateful serialisations). Trying
    to get 'cat' to do more than that for text files would be just
    as bad as trying to get 'cat' to join (in some "useful" way)
    picture files (of possibly different formats) or sound or video
    files. Don't expect cat to catenate those file types if they
    are "complete" and to get a useful result. 'cat' is
    *supposed* to be simple, and just string byte sequences
    together. If you want something more, use another program
    that does that "more" you're looking for (or write one).
    It's not the Unix/Linux utility program 'cat', nor cp.

                    /Kent K



    This archive was generated by hypermail 2.1.5 : Thu Nov 07 2002 - 15:21:34 EST