> On 16 May 2017, at 20:43, Richard Wordingham via Unicode <unicode_at_unicode.org> wrote:
>
> On Tue, 16 May 2017 11:36:39 -0700
> Markus Scherer via Unicode <unicode_at_unicode.org> wrote:
>
>> Why do we care how we carve up an illegal sequence into subsequences?
>> Only for debugging and visual inspection. Maybe some process is using
>> illegal, overlong sequences to encode something special (à la Java
>> string serialization, "modified UTF-8"), and for that it might be
>> convenient too to treat overlong sequences as single errors.
>
> I think that's not quite true. If we are moving back and forth through
> a buffer containing corrupt text, we need to make sure that moving three
> characters forward and then three characters back leaves us where we
> started. That requires internal consistency.
That’s very true. But the proposed change doesn’t actually affect that; it’s still the case that you can correctly identify boundaries in both directions.
Kind regards,
Alastair.
-- http://alastairs-place.netReceived on Wed May 17 2017 - 03:08:15 CDT
This archive was generated by hypermail 2.2.0 : Wed May 17 2017 - 03:08:16 CDT