Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

From: Michael \(michka\) Kaplan (
Date: Mon Nov 04 2002 - 11:07:47 EST

  • Next message: Michael Everson: "Re: Header Reply-To"

    From: "Joseph Boyle" <>


    > Software currently under development could use the identifiers for
    > whether to require or emit BOM, like the file requirements checker I have
    > write, and ICU/uconv.

    Lets separate that into the two issuse it represents:

    EMITTING: They could simply choose globally whether to emit the BOM or not.
    If they wanted to get "fancy" they could have a command line option which
    said whether to emit the bytes or not. But that is optional.

    INCOMING TEXT: Trivial to simply chek. I say (once again) its THERE BYTES.
    If hey are there then there is a BOM. Simple.

    > The inability to update to one standard all possible consuming software
    > might encounter (or for that matter human customers' opinions) is
    > why producing and checking software has to handle both possibilities.

    But the "both possibilities" are trivial adn its by no means dificult to do.
    Having a good program that refuses to do a little work to handle three bytes
    is like someone who runs a 100 mile marathon and then refuses to cross the
    finish line because the line is yellor instead of white.

    > What would you mean by "the right thing" as far as emitting BOM? Should
    > conversion programs only allow output of non-BOM? (or with-BOM?) Or should
    > they take the specification in an argument separate from the charset name?
    > As said before this unnecessarily requires extra logic.

    Already answered --- they can make a global decision, like notepad or other
    programs do. Especially if the progammer finds the idea of setting it as a
    huge hardship, they can skip that work and simply choose whether they want
    it or not....

    I plead with you -- keep it SIMPLE. :-)


    This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 11:53:13 EST