From: Addison Phillips (addison@yahoo-inc.com)
Date: Fri Sep 22 2006 - 16:39:23 CDT
Sadly...
See: http://www.w3.org/International/questions/qa-utf8-bom
The BOM is often rendered in the page, throwing off other display
elements. One common problem on Windows is the prevalence of editors
(Notepad!!) that add the UTF-8 BOM to text files stored as "UTF-8".
While one might expect this to act as a "no-op" character, in practice,
it isn't.
Addison
Jukka K. Korpela wrote:
> On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:
>
>> I'm using SSI to include UTF-8 encoded files within a UTF-encoded
>> HTML page on IIS (Internet Information Services). The problem is that
>> the byte order mark is not being stripped by the SSI parser,
>> resulting in BOMs within the HTML body.
>
> Can't you just remove the BOM? It's not needed in UTF-8 encoded data. It
> might be thought of as a "signature" from which it is possible to deduce
> (guess) the encoding. But for HTML files, you can and should explicitly
> specify the encoding in HTTP headers (when they are transmitted via
> HTTP) or in <meta> tags or both.
>
> If you can't do that for some reason, and if you can't make the
> inclusion mechanism remove the BOM, it shouldn't be an issue, since
> within data,
> BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an
> invisible character that "glues" the characters around it together for
> the purposes of rendering, and this should normally do no harm. Is there
> some reason to suspect that some browsers don't treat BOM either that
> way or simply ignore it (which is usually the same thing, for contexts
> where BOM would normally appear as a result of inclusion).
>
> See also the Unicode BOM FAQ,
> http://www.unicode.org/unicode/faq/utf_bom.html
>
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature.
This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 16:40:56 CDT