Re: Names for UTF-8 with and without BOM

From: John Cowan (jcowan@reutershealth.com)
Date: Sat Nov 02 2002 - 21:13:31 EST

  • Next message: Tex Texin: "Re: Names for UTF-8 with and without BOM"

    Tex Texin scripsit:

    > Interestingly, although I didn't study it in detail, looking at rfc 2376
    > for prioritization over charset conflicts, it seems to recommend
    > stripping the BOM when converting from utf-16 to other charsets (and
    > without considering that ucs-4 would like to keep it). (section 5).

    The point is not to try to convert it into an FFEF character or some
    replacement thereof, like say "?".

    > Also, in considering charset conflicts, 2376 fails to consider conflicts
    > between signature and the encoding declaration. (I have a utf-16BE BOM
    > and the encoding declaration is for utf-8...).

    The encoding declaration is supposed to trump all. So it is UTF-8, and
    since 0xFF is illegal in UTF-8, you blow chunks...

    > I'll have to check for a more up-to-date rfc.

    There is none.

    -- 
    John Cowan <jcowan@reutershealth.com>     http://www.reutershealth.com
    I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
    han mathon ne chae, a han noston ne 'wilith.  --Galadriel, _LOTR:FOTR_
    


    This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 21:42:44 EST