Re: Names for UTF-8 with and without BOM

From: Tex Texin (
Date: Sat Nov 02 2002 - 21:24:06 EST

  • Next message: David Starner: "Re: Header Reply-To"

    John Cowan wrote:
    > Tex Texin scripsit:
    > > Interestingly, although I didn't study it in detail, looking at rfc 2376
    > > for prioritization over charset conflicts, it seems to recommend
    > > stripping the BOM when converting from utf-16 to other charsets (and
    > > without considering that ucs-4 would like to keep it). (section 5).
    > The point is not to try to convert it into an FFEF character or some
    > replacement thereof, like say "?".

    That may be the intent, but it doesn't say that. It should say convert
    BOM to the equivalent BOM for the target encoding, if there is one.
    Instead it says to strip it for other encodings.
    (I wish it was called a signature rather than a BOM for most of these

    > > Also, in considering charset conflicts, 2376 fails to consider conflicts
    > > between signature and the encoding declaration. (I have a utf-16BE BOM
    > > and the encoding declaration is for utf-8...).
    > The encoding declaration is supposed to trump all. So it is UTF-8, and
    > since 0xFF is illegal in UTF-8, you blow chunks...

    OK, but where is that written?

    > > I'll have to check for a more up-to-date rfc.
    > There is none.

    OK. Sorry if I seem to be difficult. I am just rereading a few things
    with my new understanding to put the picture back together again.

    > --
    > John Cowan <>
    > I amar prestar aen, han mathon ne nen,
    > han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_

    Tex Texin   cell: +1 781 789 1898
    Xen Master                
    Making e-Business Work Around the World

    This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 21:53:17 EST