From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Sat Feb 15 2003 - 22:10:10 EST
From: "Roozbeh Pournader" <roozbeh@sharif.edu>
> PS: UTF-16 is an exception to that, since the BOM is not part of the
> document and should be removed for processing.
And to whatever extent UTF-8 has a BOM, it would fall under the same
category. Certainly that is how processors that understand the UTF-8
BOM deal with it.
Rather then treating HTML like the SQL standard (lofty goals that no
one company completely supports because it would be insane to do it!)
they can bend to the actual usage out there and just move on, right?
Even if you ignore the BOM as a BOM, the notion that a zero width
space is legal but a zero width no break space is not just smacks of
silliness. But at the beginning of an HTML page you are either going
to not show it because you stripped it as a BOM or not show it because
there is no visible representation for it.
How many browsers plan to refuse to show pages that do not follow HTML
4.0 rules? :-)
Of course if I had a penny for every byte that has been used
discussing these three bytes sometimes found at the beginning of a
UTF-8 document, I would not be working this weekend; I'd be somewhere
really warm and sunny.
MichKa
This archive was generated by hypermail 2.1.5 : Sat Feb 15 2003 - 22:54:40 EST