Re: UTF-8 Corrigendum, new Glossary

From: David Starner (dvdeug@x8b4e516e.dhcp.okstate.edu)
Date: Thu Nov 30 2000 - 20:47:25 EST


On Thu, Nov 30, 2000 at 04:48:56PM -0800, G. Adam Stanislav wrote:
> If the source (in Ister) uses illegal but decipherable UTF-8, my
> software accepts it. Naturally, before it sends it out it transforms
> it to perfectly legal UTF-8. The idea I should reject it is silly
> (and, no, the "internal data" clause does not apply here: my software
> accepts data from an external source). Rejecting it would mean
> that if the web page designer used some design software that messed
> up the UTF-8 encoding, the web page would suddenly miss a letter here,
> a letter there.

It could do a lot more than that - if the encoding is messed up, you
could be getting anything, from Latin-1, to UTF-16, to pure noise. So
why does this particular mistake matter more than those? It's the
responsibility of the design software to get it right, not your code's
obligation to try and understand it.

> Not rejecting it poses no security risk, so, for this
> specific application it is better to accept it (and correct it) than
> to reject it.

Is that your rule in all cases, to try and guess what they meant and do
that? It'll be hell on anyone who has to try and interpret Ister if
there's a large chunk of code that follows no standards, but was read by
the original interpreter. (Or even later versions of that interpreter -
I've hung around the gcc lists long enough to know that people don't
like "that's no longer supported" or even "that was never officially
supported.")

Even if it works fine in the case of your interpreter, it'll come to
problems when it gets fed through a UTF-8 conformant (or non-multi-byte
aware) text tool that won't interpret over-long sequences. Especially
non-multi-byte aware tools, since they will seem to work and silently
get stuff wrong. It seems better just to refuse it, and force the buggy
software to get fixed, than have a bunch of obscure bugs show up latter.

-- 
David Starner - dstarner98@aasaa.ofe.org
http://dvdeug.dhis.org
Looking for a Debian developer in the Stillwater, Oklahoma area 
to sign my GPG key



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT