Re: BOM's at Beginning of Web Pages?

From: Roozbeh Pournader (roozbeh@sharif.edu)
Date: Sat Feb 15 2003 - 22:48:21 EST

  • Next message: Michael \(michka\) Kaplan: "Re: BOM's at Beginning of Web Pages?"

    On Sat, 15 Feb 2003, Michael (michka) Kaplan wrote:

    > And to whatever extent UTF-8 has a BOM, it would fall under the same
    > category. Certainly that is how processors that understand the UTF-8
    > BOM deal with it.

    Well, that needs researching into what UTF-8 is in W3C and HTML 4.0 terms:

    What is a character set for interchange over the Internet? Section 6.9
    answers that:

       "The "charset" attributes (%Charset in the DTD) refer to a character
       encoding as described in the section on character encodings. Values
       must be strings (e.g., "euc-jp") from the IANA registry (see
       [CHARSETS] for a complete list)."

    Specially note the "must" term above. The [CHARSETS] reference is:

       "[CHARSETS]
           Registered charset values. Download a list of registered charset
           values from
           ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets"

    So it's time to go there. The URL above says:

      "The Character Sets Registry has moved to the following:

       http://www.iana.org/assignments/character-sets"

    OK, we'll go there instead and search for UTF-8. It says:

      "Name: UTF-8 [RFC2279]
       MIBenum: 106
       Source: RFC 2279
       Alias: None"

    RFC 2279. A copy can be found at <http://www.ietf.org/rfc/rfc2279.txt>, or
    any other place you like and search for FEFF, BOM, ZERO WIDTH NO-BREAK
    SPACE, or the sequence "EF BB BF" there. Nothing can be found.

    > Rather then treating HTML like the SQL standard (lofty goals that no
    > one company completely supports because it would be insane to do it!)
    > they can bend to the actual usage out there and just move on, right?
    > [...]
    > How many browsers plan to refuse to show pages that do not follow HTML
    > 4.0 rules? :-)

    I agree, but the Unicode web age is the buggy thing here, not the specific
    browser that was reported earlier to have a problem with it. That's all my
    point. One should fix the Unicode web page instead of that browser.

    I also personally belive that any browser should fix the small istakes
    made by the author (or the authoring software) in some way or other, but
    isn't it better for the author not to make the mistake, or fix it when one
    finds about it?

    roozbeh



    This archive was generated by hypermail 2.1.5 : Sat Feb 15 2003 - 23:15:44 EST