From: Addison Phillips (addison@yahoo-inc.com)
Date: Fri Sep 22 2006 - 16:39:23 CDT
Sadly...
See: http://www.w3.org/International/questions/qa-utf8-bom
The BOM is often rendered in the page, throwing off other display 
elements. One common problem on Windows is the prevalence of editors 
(Notepad!!) that add the UTF-8 BOM to text files stored as "UTF-8". 
While one might expect this to act as a "no-op" character, in practice, 
it isn't.
Addison
Jukka K. Korpela wrote:
> On Fri, 22 Sep 2006, Mark Cilia Vincenti wrote:
> 
>> I'm using SSI to include UTF-8 encoded files within a UTF-encoded
>> HTML page on IIS (Internet Information Services). The problem is that 
>> the byte order mark is not being stripped by the SSI parser,
>> resulting in BOMs within the HTML body.
> 
> Can't you just remove the BOM? It's not needed in UTF-8 encoded data. It 
> might be thought of as a "signature" from which it is possible to deduce 
> (guess) the encoding. But for HTML files, you can and should explicitly 
> specify the encoding in HTTP headers (when they are transmitted via 
> HTTP) or in <meta> tags or both.
> 
> If you can't do that for some reason, and if you can't make the 
> inclusion mechanism remove the BOM, it shouldn't be an issue, since 
> within data,
> BOM (U+FEFF, ZERO-WIDTH NON-BREAKING SPACE) should be treated as an 
> invisible character that "glues" the characters around it together for 
> the purposes of rendering, and this should normally do no harm. Is there 
> some reason to suspect that some browsers don't treat BOM either that 
> way or simply ignore it (which is usually the same thing, for contexts 
> where BOM would normally appear as a result of inclusion).
> 
> See also the Unicode BOM FAQ, 
> http://www.unicode.org/unicode/faq/utf_bom.html
> 
-- Addison Phillips Globalization Architect -- Yahoo! Inc. Internationalization is an architecture. It is not a feature.
This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 16:40:56 CDT