Re: Utility to report and repair broken surrogate pairs in UTF-16 text

From: Doug Ewell (doug@ewellic.org)
Date: Fri Nov 05 2010 - 08:02:34 CST

Next message: Asmus Freytag: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"

Previous message: Martin J. D�rst: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Maybe in reply to: Jim Monty: "Utility to report and repair broken surrogate pairs in UTF-16 text"
Next in thread: Asmus Freytag: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Reply: Asmus Freytag: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Asmus Freytag <asmusf at ix dot netcom dot com> wrote:

>> I'm probably missing something here, but I don't agree that it's OK
>> for a consumer of UTF-16 to accept an unpaired surrogate without
>> throwing an error, or converting it to U+FFFD, or otherwise raising a
>> fuss. Unpaired surrogates are ill-formed, and have to be caught and
>> dealt with.
>
> The question is whether you want every library that handles strings
> perform the equivalent of a citizen's arrest, or whether you architect
> things that the gatekeepers (border control) police the data stream.

If you can have upstream libraries check for unpaired surrogates at the
time they convert UTF-16 to Unicode code points, then your point is well
taken, because then the downstream libraries are no longer dealing with
UTF-16, but with code points. Doing conversion and validation at
different stages isn't a great idea; that's how character encodings get
involved with security problems.

Corrigendum #1 closed the door on interpretation of invalid UTF-8
sequences. I'm not sure why the approach to handling UTF-16 should be
any different.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Next message: Asmus Freytag: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Previous message: Martin J. D�rst: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Maybe in reply to: Jim Monty: "Utility to report and repair broken surrogate pairs in UTF-16 text"
Next in thread: Asmus Freytag: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Reply: Asmus Freytag: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 05 2010 - 08:07:39 CST