From: Tom Gewecke (tom@bluesky.org)
Date: Sat Feb 15 2003 - 11:59:47 EST
Michael Everson recently pointed out that the Unicode home page seems to
begin with the character U+FEFF (ZWNBS/BOM), encoded as UTF-8. Presumably
this is an artifact created by the program used to make the page, although
I haven't noticed it on any others on the site.
I had a look at the BOM faq and am wondering if any list members could
confirm my understanding of the proper use of BOM at the start of web pages:
--The only case where a BOM should be used is when the byte order is not
specified by the encoding/charset listed in the HTML, i.e. UTF-16 or 32.
For all others, including the BE and LE varieties of the latter, it should
not be used.
--If the page is marked UTF-16 and has no BOM it will be interpreted as
UTF-16BE.
--U+FEFF can appear (presumably by accident) at the beginning of any web
page, but aside from those two cases where it is necessary, it is a ZWNBS
and not a BOM. (As Michael pointed out, Mac IE 5.2.2 displays a Euro
symbol).
Suppose a page has no charset/encoding specified in the markup. Does the
presence of U+FEFF mean it should be presumed to be UTF-16? Some of my
browsers behave this way.
This archive was generated by hypermail 2.1.5 : Sat Feb 15 2003 - 12:38:36 EST