Re: Names for UTF-8 with and without BOM

From: Michael \(michka\) Kaplan (
Date: Sun Nov 03 2002 - 09:22:54 EST

  • Next message: John H. Jenkins: "Re: ct, fj and blackletter ligatures"

    From: <>

    > In particular, I'm thinking of a situation about a year and a half ago
    > (IIRC) in which Michael (and I and others) were strongly opposed to a
    > suggestion that the Unicode Consortium should document a certain variation
    > (perversion, some would say) of one of the Unicode encoding forms that a
    > certain vendor had implemented in their software. On that occasion,
    > Michael (and I and others) were arguing that, just because they had done
    > something in their software, that shouldn't mean that the rest of the
    > world should be forced to support their encoding form.
    > I find it interesting, then, to see Michael saying that, since Notepad
    > sticks a BOM-cum-signature at the start of its UTF-8, the rest of the
    > world should support it.

    I do not see the conflict, or the irony? Remember that what Notepad and
    others do is present mainly because it *is* in the XML standard, What was
    being done by those others with UTF-8 was not a part of the UTF-8 "standard"
    and was in fact specifically disallowed. In the end, note that UTF-8 was not
    compromised; they got their own [non-preferred] encoding scheme for their
    backcompat requirement, and they now have the "job" of making their products
    use it in name.

    If someone has a bug or problem in their software, then it is of course
    their responsibility to fix it. On the other hand, if one pays attention to
    a possible (optional) recommendation in a standard, it is the standard's
    responsibility to not make people regret that they paid attention?

    (Which is not to say that they got the "idea" from XML; I am not sure where
    the idea came from. I figure that there was a strong interest in making sure
    that when someone saved a file as UTF-8 that when reloaded it would still be
    considered UTF-8, rather than ASCII or ANSI [sic]. This is a good reason for
    such a decision in plain text --and the fact that XML is after all "just
    text" is lost on no one...)

    Given the strong lack of interest that XML has had in the notion of breaking
    old parsers or valid XML 1.0 streams, it seems unlikely (to me) that they
    would make such a breaking change in a future version of XML.


    This archive was generated by hypermail 2.1.5 : Sun Nov 03 2002 - 10:00:26 EST