From: Doug Ewell (dewell@roadrunner.com)
Date: Wed May 28 2008 - 07:27:56 CDT
Marcin ‘Qrczak’ Kowalczyk <qrczak at knm dot org dot pl> wrote:
> A UTF-8 with a BOM is stateful: the decoder must remember whether it 
> has seen a BOM or whether it is past the beginning, and the encoder 
> must remember if it is at the beginning, to know whether to emit 
> U+FEFF twice for the case when the data begins with U+FEFF. A UTF-8 
> without any special treatment of U+FEFF at the beginning is stateless. 
> Both variants of UTF-8 are in use. It would be better to distinguish 
> them explicitly, like UTF-16 is distinguished from UTF-16BE & 
> UTF-16LE.
Nobody has yet shown me a realistic (non-contrived) scenario of Unicode 
data beginning with ZERO-WIDTH NO-BREAK SPACE.  It would make no sense; 
the whole purpose of ZWNBSP as such is to be placed *between* two 
characters.  Certainly it can be done, just as a diaeresis can be 
positioned after a control character, but it's not realistic.
-- Doug Ewell * Arvada, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Wed May 28 2008 - 07:30:13 CDT