From: Doug Ewell (doug@ewellic.org)
Date: Fri Nov 05 2010 - 14:56:09 CST
Asmus Freytag <asmusf at ix dot netcom dot com> wrote:
>> Doing conversion and validation at different stages isn't a great
>> idea; that's how character encodings get involved with security
>> problems.
>
> Note that I am careful not to suggest that (and I'm sure Markus isn't
> either). "Handling" includes much more than code conversion. It
> includes uppercasing, spell checking, sorting, searching, the whole
> lot. Burdening every single one of those tasks with policing the
> integrity of the encoding seems wasteful, and, as I tried to explain,
> puts the error detection in a place where you'll be most likely
> prevented from doing something useful in recovery.
Right, but as I said, those downstream tasks shouldn't be consumers of
UTF-16 code units anyway. They should be consumers of Unicode code
points, which by definition excludes loose surrogates.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s
This archive was generated by hypermail 2.1.5 : Fri Nov 05 2010 - 15:00:30 CST