From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Sep 22 2006 - 15:12:25 CDT
On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:
> I'm using SSI to include UTF-8 encoded files within a UTF-encoded
> HTML page on IIS (Internet Information Services). The problem is that
> the byte order mark is not being stripped by the SSI parser,
> resulting in BOMs within the HTML body.
Can't you just remove the BOM? It's not needed in UTF-8 encoded data. It
might be thought of as a "signature" from which it is possible to deduce
(guess) the encoding. But for HTML files, you can and should explicitly
specify the encoding in HTTP headers (when they are transmitted via HTTP)
or in <meta> tags or both.
If you can't do that for some reason, and if you can't make the inclusion
mechanism remove the BOM, it shouldn't be an issue, since within data,
BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an
invisible character that "glues" the characters around it together for the
purposes of rendering, and this should normally do no harm. Is there some
reason to suspect that some browsers don't treat BOM either that way or
simply ignore it (which is usually the same thing, for contexts where BOM
would normally appear as a result of inclusion).
See also the Unicode BOM FAQ,
http://www.unicode.org/unicode/faq/utf_bom.html
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 15:16:49 CDT