RE: Utility to report and repair broken surrogate pairs in UTF-16 text

From: Doug Ewell (doug@ewellic.org)
Date: Fri Nov 05 2010 - 14:56:09 CST

Next message: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"

Previous message: Mark Davis ☕: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Maybe in reply to: Jim Monty: "Utility to report and repair broken surrogate pairs in UTF-16 text"
Next in thread: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Reply: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Asmus Freytag <asmusf at ix dot netcom dot com> wrote:

>> Doing conversion and validation at different stages isn't a great
>> idea; that's how character encodings get involved with security
>> problems.
>
> Note that I am careful not to suggest that (and I'm sure Markus isn't
> either). "Handling" includes much more than code conversion. It
> includes uppercasing, spell checking, sorting, searching, the whole
> lot. Burdening every single one of those tasks with policing the
> integrity of the encoding seems wasteful, and, as I tried to explain,
> puts the error detection in a place where you'll be most likely
> prevented from doing something useful in recovery.

Right, but as I said, those downstream tasks shouldn't be consumers of
UTF-16 code units anyway. They should be consumers of Unicode code
points, which by definition excludes loose surrogates.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Next message: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Previous message: Mark Davis ☕: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Maybe in reply to: Jim Monty: "Utility to report and repair broken surrogate pairs in UTF-16 text"
Next in thread: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Reply: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 05 2010 - 15:00:30 CST