From: Doug Ewell (dewell@roadrunner.com)
Date: Wed May 28 2008 - 07:27:56 CDT
Marcin ‘Qrczak’ Kowalczyk <qrczak at knm dot org dot pl> wrote:
> A UTF-8 with a BOM is stateful: the decoder must remember whether it
> has seen a BOM or whether it is past the beginning, and the encoder
> must remember if it is at the beginning, to know whether to emit
> U+FEFF twice for the case when the data begins with U+FEFF. A UTF-8
> without any special treatment of U+FEFF at the beginning is stateless.
> Both variants of UTF-8 are in use. It would be better to distinguish
> them explicitly, like UTF-16 is distinguished from UTF-16BE &
> UTF-16LE.
Nobody has yet shown me a realistic (non-contrived) scenario of Unicode
data beginning with ZERO-WIDTH NO-BREAK SPACE. It would make no sense;
the whole purpose of ZWNBSP as such is to be placed *between* two
characters. Certainly it can be done, just as a diaeresis can be
positioned after a control character, but it's not realistic.
-- Doug Ewell * Arvada, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Wed May 28 2008 - 07:30:13 CDT