RE: Names for UTF-8 with and without BOM - pragmatic

From: Joseph Boyle (
Date: Wed Nov 20 2002 - 13:28:54 EST

  • Next message: Deborah Goldsmith: "Re: Quick ATSUI question"

    Working in a large organization whose product includes a large number of configuration and data files in text formats, I can say something about what we have found to work during development, localization, and release engineering, across multiple platforms .

    We have eliminated UTF-16 text file formats in favor of UTF-8 because of Unix standard toolkit and other Unix-based tools' poor ability to deal with UTF-16. On the other hand, the BOM on UTF-8 has been useful and has not caused problems with Unix tools processing, including pipe sequences. Raw concatenation of files which would produce internal ZWNBSPs is not part of any of our processing as far as I know.

    -----Original Message-----
    From: David Starner []
    Sent: Thursday, November 07, 2002 12:14 PM
    To: Markus Scherer
    Cc: unicode
    Subject: Re: Names for UTF-8 with and without BOM - pragmatic

    On Wed, Nov 06, 2002 at 09:47:43AM -0800, Markus Scherer wrote:
    > The fact is that Windows uses UTF-8 and UTF-16 plain text files with
    > signatures (BOMs) very simply, gracefully, and successfully. It has
    > applied what I called the "pragmatic" approach here for about 10
    > years. It just works.

    It just works in an environment where relatively few documents are plain text, and that doesn’t use pipes of text as universal glue. C has been described as a (C)haracter processing language; whether or not that’s accurate, Awk and Perl certainly are; these are all Unix programming languages, and at the heart of what Unix is. The simple Unix program has a stream of text coming in and a stream of text going out, whereas the simple Windows program has a window. What works for Windows may very well not work for Unix.

    David Starner -
    Great is the battle-god, great, and his kingdom--
    A field where a thousand corpses lie. 
      -- Stephen Crane, "War is Kind"

    This archive was generated by hypermail 2.1.5 : Wed Nov 20 2002 - 14:08:51 EST