Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

From: J Decker via Unicode <unicode_at_unicode.org>
Date: Mon, 24 Jul 2017 12:12:06 -0700

On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode <
unicode_at_unicode.org> wrote:

> Hi Folks,
>
> 2. (Bug) The sending application performs the folding process - inserts
> CRLF plus white space characters - and the receiving application does the
> unfolding process but doesn't properly delete all of them.
>
> The RFC doesn't say 'characters' but either a space or a tab character
(singular)

back scanning is simple enough

while( ( from[0] & 0xC0 ) == 0x80 )
from--;

should probably also check that from > (start+1) but since it should be
applied at 75-ish characters, that would be implicitly true.
Received on Mon Jul 24 2017 - 14:12:33 CDT

This archive was generated by hypermail 2.2.0 : Mon Jul 24 2017 - 14:12:33 CDT