UTC action on malformed/illegal UTF-8 sequences?

From: Hart, Edwin F. (Edwin.Hart@jhuapl.edu)
Date: Thu Oct 19 2000 - 16:09:56 EDT


Does the UTC need to address the issue of malformed and illegal UTF-8
sequences, etc.? The text in question is the example in D32 and the last
sentence of the section on shortest encoding.

Background

The Unicode philosophy has been to avoid killing characters your software
doesn't understand. This enables adding new characters to the code without
killing the software that was written before the new characters were added.

The Security philosophy seems to be: If it is out of specification, kill it
("Anything not explicitly allowed is denied.").

The situation in the attached message is not the same as in the first
paragraph.

RFC 2279 (UTF-8) lists some examples that could cause security problems.
Section 3.8 of The Unicode Standard, Version 3.0 seems to permit
interpretation of "ill-formed code value sequences" that can cause other
software to mis-interpret the characters and produce the wrong action.

The issue for UTC may be: If a process receives an "ill-formed" code
sequence, should the standard specify the action or allow interpretation and
give warnings (like RFC 2279). Will more software break if the ill-formed
sequence is allowed or denied? Given the number of security problems and
fixes I see a week, I personally think that the UTC needs to tighten the
algorithms and require an exception condition rather than interpret the
ill-formed code value sequences

Ed

Edwin F. Hart
edwin.hart@jhuapl.edu
The Johns Hopkins University Applied Physics Laboratory
11100 Johns Hopkins Road
Laurel, MD 20723-6099
USA
+1-443-778-6926 (Baltimore area)
+1-240-228-6926 (Washington, DC area)
+1-443-778-1093 (fax)
+1-240-228-1093 (fax)

-----Original Message-----
From: Cris Bailiff [mailto:c.bailiff@E-SECURE.COM.AU]
Sent: Thursday, October 19, 2000 6:08 AM
To: BUGTRAQ@SECURITYFOCUS.COM
Subject: Re: IIS %c1%1c remote command execution

> Florian Weimer <Florian.Weimer@RUS.UNI-STUTTGART.DE> writes:
>
> This is one of the vulnerabilities Bruce Schneier warned of in one of
> the past CRYPTO-GRAM isssues. The problem isn't the wrong time of
> path checking alone, but as well a poorly implemented UTF-8 decoder.
> RFC 2279 explicitly says that overlong sequences such as 0xC0 0xAF are
> invalid.

As someone often involved in reviewing and improving other peoples web code,
I
have been citing the unicode security example from RFC2279 as one good
reason why
web programmers must enforce 'anything not explicitly is allowed is denied'
almost since it was written. In commercial situations I have argued myself
blue
in the face that the equivalent of (perl speak) s!../!!g is not good enough
to
clean up filename form input parameters or other pathnames (in perl, ASP,
PHP
etc.). I always end up being proved right, but it takes a lot of effort.
Should
prove a bit easier from now on :-(

>
> It's a pity that a lot of UTF-8 decoders in free software fail such
> tests as well, either by design or careless implementation.

The warning in RFC 2279 hasn't been heeded by a single unicode decoder that
I
have ever tested, commercial or free, including the Solaris 2.6 system
libraries,
the Linux unicode_console driver, Netscape commuicator and now, obviously,
IIS.
Its unclear to me whether the IIS/NT unicode decoding is performed by a
system
wide library or if its custom to IIS - either way, it can potentially affect
almost any unicode aware NT application.

I have resisted upgrading various cgi and mod_perl based systems to perl5.6
because it has inbuilt (default?) unicode support, and I've no idea which
applications or perl libraries might be affected. The problem is even harder
than
it looks - which sub-system, out of the http server, the perl (or ASP or
PHP...)
runtime, the standard C libraries and the kernel/OS can I expect to be
performing
the conversion? Which one will get it right? I think Bruce wildly
understated the
problem, and I've no idea how to put the brakes on the crash dive into a
character encoding standard which seems to have no defined canonical
encoding and
no obvious way of performing deterministic comparisons.

I suppose as a security professional I should be happy, looking forward to a
booming business...

Cris Bailiff
c.bailiff@e-secure.com.au



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT