Re: Problem with SSI and BOM

From: Addison Phillips (addison@yahoo-inc.com)
Date: Fri Sep 22 2006 - 16:39:23 CDT

Next message: Kenneth Whistler: "Re: Fw: Unicode & space in programming & l10n"

Previous message: Henrik Theiling: "Re: Fw: Unicode & space in programming & l10n"
In reply to: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Next in thread: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Reply: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Reply: Doug Ewell: "Re: Problem with SSI and BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Sadly...

See: http://www.w3.org/International/questions/qa-utf8-bom

The BOM is often rendered in the page, throwing off other display
elements. One common problem on Windows is the prevalence of editors
(Notepad!!) that add the UTF-8 BOM to text files stored as "UTF-8".
While one might expect this to act as a "no-op" character, in practice,
it isn't.

Addison

Jukka K. Korpela wrote:
> On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:
>
>> I'm using SSI to include UTF-8 encoded files within a UTF-encoded
>> HTML page on IIS (Internet Information Services). The problem is that
>> the byte order mark is not being stripped by the SSI parser,
>> resulting in BOMs within the HTML body.
>
> Can't you just remove the BOM? It's not needed in UTF-8 encoded data. It
> might be thought of as a "signature" from which it is possible to deduce
> (guess) the encoding. But for HTML files, you can and should explicitly
> specify the encoding in HTTP headers (when they are transmitted via
> HTTP) or in <meta> tags or both.
>
> If you can't do that for some reason, and if you can't make the
> inclusion mechanism remove the BOM, it shouldn't be an issue, since
> within data,
> BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an
> invisible character that "glues" the characters around it together for
> the purposes of rendering, and this should normally do no harm. Is there
> some reason to suspect that some browsers don't treat BOM either that
> way or simply ignore it (which is usually the same thing, for contexts
> where BOM would normally appear as a result of inclusion).
>
> See also the Unicode BOM FAQ,
> http://www.unicode.org/unicode/faq/utf_bom.html
>

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Internationalization is an architecture.
It is not a feature.

Next message: Kenneth Whistler: "Re: Fw: Unicode & space in programming & l10n"
Previous message: Henrik Theiling: "Re: Fw: Unicode & space in programming & l10n"
In reply to: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Next in thread: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Reply: Jukka K. Korpela: "Re: Problem with SSI and BOM"
Reply: Doug Ewell: "Re: Problem with SSI and BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 16:40:56 CDT