> Wouldn't the clean way be to ensure valid strings (only) when they're
>> built
>>
>
> Of course, the earlier erroneous data gets caught, the better. The problem
> is that error checking is expensive, both in lines of code and in execution
> time (I think there is data showing that in any real-life programs, more
> than 50% or 80% or so is error checking, but I forgot the details).
>
> So indeed as Ken has explained with a very good example, it doesn't make
> sense to check at every corner.
What I meant: The idea was to check only when a string is constructed. As
soon as it's been fed into a collation/whatever algorithm, the algorithm
should assume the original input was well-formed and shouldn't do any more
error-checking, yes.
Not having facilities for dealing with ill-formed values ("U+"D800 ..
"U+"DFFF) in an algorithm will surely make *something* faster, even if it's
just some table that's being used indirectly having fewer entries.
What I had in mind is a library where the public interface only ever allows
Unicode scalar values to be in- and output. This will lead to a cleaner
interface. A data structure that can hold surrogate values can and should
be used algorithm-*internally*, if that makes things more efficient, safer,
etc.
Convenience of implementation is an important aspect in programming.
For a user yes, but not for a library writer/maintainer, I would suggest.
The STL uses red-black trees; these are annoyingly difficult to implement
but invisible to the user.
Stephan
Received on Tue Jan 08 2013 - 05:09:27 CST
This archive was generated by hypermail 2.2.0 : Tue Jan 08 2013 - 05:09:28 CST