Re: BOM's at Beginning of Web Pages?

From: Tom Gewecke (tom@bluesky.org)
Date: Mon Feb 17 2003 - 08:57:43 EST

  • Next message: Roozbeh Pournader: "Re: BOM's at Beginning of Web Pages?"

    >If this is true -- that U+FEFF is a kind of meta-character that doesn't
    >really belong to the text per se -- then it should be equally true for
    >UTF-8, whether its role is as a true Byte Order Mark (needed in UTF-16
    >and UTF-32 but not UTF-8) or as a signature (potentially useful in all
    >Unicode CES's). Only in its evil-twin role as a zero-width no-break
    >space is it truly part of the text, in which case the previous
    >discussion comments about white-space characters applies.

    For what it is worth, the XML doc
    (http://www.w3.org/TR/2000/REC-xml-20001006#sec-documents) says this about
    the BOM:

    >Entities encoded in UTF-16 must begin with the Byte Order Mark ... This is
    >an >encoding signature, not part of either the markup or the character data
    >of the XML document. XML processors must be able to use this character to
    >>differentiate between UTF-8 and UTF-16 encoded documents.

    The implication seems to be that in XML, at least, UTF-8 will not have a
    BOM (or an encoding declaration). Other parts of the doc, especially
    Appendix F, seem to recognize that anything can come either with or without
    a BOM. Anything not either UTF-8 or UTF-16 must have an encoding
    declaration as well.



    This archive was generated by hypermail 2.1.5 : Mon Feb 17 2003 - 09:47:09 EST