From: Roozbeh Pournader (roozbeh@sharif.edu)
Date: Sat Feb 15 2003 - 22:48:21 EST
On Sat, 15 Feb 2003, Michael (michka) Kaplan wrote:
> And to whatever extent UTF-8 has a BOM, it would fall under the same
> category. Certainly that is how processors that understand the UTF-8
> BOM deal with it.
Well, that needs researching into what UTF-8 is in W3C and HTML 4.0 terms:
What is a character set for interchange over the Internet? Section 6.9 
answers that:
   "The "charset" attributes (%Charset in the DTD) refer to a character 
   encoding as described in the section on character encodings. Values 
   must be strings (e.g., "euc-jp") from the IANA registry (see 
   [CHARSETS] for a complete list)."
Specially note the "must" term above. The [CHARSETS] reference is:
   "[CHARSETS]
       Registered charset values. Download a list of registered charset 
       values from
       ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets"
So it's time to go there. The URL above says:
  "The Character Sets Registry has moved to the following:
   http://www.iana.org/assignments/character-sets"
OK, we'll go there instead and search for UTF-8. It says:
  "Name: UTF-8                                              [RFC2279]
   MIBenum: 106
   Source: RFC 2279
   Alias: None"
RFC 2279. A copy can be found at <http://www.ietf.org/rfc/rfc2279.txt>, or 
any other place you like and search for FEFF, BOM, ZERO WIDTH NO-BREAK 
SPACE, or the sequence "EF BB BF" there. Nothing can be found.
> Rather then treating HTML like the SQL standard (lofty goals that no
> one company completely supports because it would be insane to do it!)
> they can bend to the actual usage out there and just move on, right?
> [...]
> How many browsers plan to refuse to show pages that do not follow HTML
> 4.0 rules? :-)
I agree, but the Unicode web age is the buggy thing here, not the specific
browser that was reported earlier to have a problem with it. That's all my 
point. One should fix the Unicode web page instead of that browser.
I also personally belive that any browser should fix the small istakes
made by the author (or the authoring software) in some way or other, but
isn't it better for the author not to make the mistake, or fix it when one
finds about it?
roozbeh
This archive was generated by hypermail 2.1.5 : Sat Feb 15 2003 - 23:15:44 EST