Re: pre-HTML5 and the BOM

From: Doug Ewell <doug_at_ewellic.org>
Date: Sat, 14 Jul 2012 15:14:10 -0600

Philippe Verdy wrote:

> It would break if the only place where to place a BOM is just the
> start of a file. But as I propose, we allow BOMs to occur anywhere to
> specify which encoding to use to decode what follows each one, even
> shell scripts would work (you could place the BOM on a comment line
> after a hash symbol, that line still being below the initial hash-bang
> line. In that case, even the various UTFs would be mixable, extra BOMs
> would not hurt. and we would live without the legacy use of an
> unspecified encoding. That BOM would have to be recognized for any
> standard UTF (UTF-8, UTF-16 and UTF-32, and optionally CESU-8 if it
> helps; some platforms would even use their own compliant UTFs it it
> helps for better performance, for their internal handling within the
> boundaries of that platform)

U+FEFF is specifically defined as having the BOM semantic only when it
appears at the beginning of the file or stream. Everywhere else, it can
have only the ZWNBSP semantic. There are many good reasons for this.

A related question, though, is why some people think the sky will fall
if a text file contains loose zero-width no-break spaces. U+FEFF is the
very model of a default ignorable code point.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­ 
Received on Sun Jul 15 2012 - 06:36:45 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 15 2012 - 07:27:03 CDT