From: Doug Ewell (dewell@adelphia.net)
Date: Sun Feb 04 2007 - 15:46:29 CST
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>> I'm sure it would not be difficult to edit Section 2.5 to explain
>> this, something like:
>>
>> "An initial U+FEFF is encoded in BOCU-1 with the three bytes FB EE
>> 28. Note that adding or stripping an initial U+FEFF generally
>> requires the next code point above U+0020 to be re-encoded."
>
> ... unless there's a C0 control character (below U+0020) before such
> codepoint (above U+0020) occurs. There's no reencoding if the first
> non-SPACE character after the leading bom is a control like a
> end-of-line sequence or a tabulation, or if it's a character in the
> U+FE80..U+FEFF range.
Correct. Phrasing this in a clear and succinct way is left as an
exercise.
-- Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages
This archive was generated by hypermail 2.1.5 : Sun Feb 04 2007 - 15:47:48 CST