Re: Problem with SSI and BOM

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Sep 22 2006 - 15:12:25 CDT

Next message: Richard Wordingham: "Re: Fw: Unicode & space in programming & l10n"

Previous message: Mike: "Re: Fw: Unicode & space in programming & l10n"
In reply to: Mark Cilia Vincenti: "Problem with SSI and BOM"
Next in thread: Addison Phillips: "Re: Problem with SSI and BOM"
Reply: Addison Phillips: "Re: Problem with SSI and BOM"
Reply: Philippe Verdy: "Re: Problem with SSI and BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:

> I'm using SSI to include UTF-8 encoded files within a UTF-encoded
> HTML page on IIS (Internet Information Services). The problem is that
> the byte order mark is not being stripped by the SSI parser,
> resulting in BOMs within the HTML body.

Can't you just remove the BOM? It's not needed in UTF-8 encoded data. It
might be thought of as a "signature" from which it is possible to deduce
(guess) the encoding. But for HTML files, you can and should explicitly
specify the encoding in HTTP headers (when they are transmitted via HTTP)
or in <meta> tags or both.

If you can't do that for some reason, and if you can't make the inclusion
mechanism remove the BOM, it shouldn't be an issue, since within data,
BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an
invisible character that "glues" the characters around it together for the
purposes of rendering, and this should normally do no harm. Is there some
reason to suspect that some browsers don't treat BOM either that way or
simply ignore it (which is usually the same thing, for contexts where BOM
would normally appear as a result of inclusion).

See also the Unicode BOM FAQ,
http://www.unicode.org/unicode/faq/utf_bom.html

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Next message: Richard Wordingham: "Re: Fw: Unicode & space in programming & l10n"
Previous message: Mike: "Re: Fw: Unicode & space in programming & l10n"
In reply to: Mark Cilia Vincenti: "Problem with SSI and BOM"
Next in thread: Addison Phillips: "Re: Problem with SSI and BOM"
Reply: Addison Phillips: "Re: Problem with SSI and BOM"
Reply: Philippe Verdy: "Re: Problem with SSI and BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 15:16:49 CDT