Re: PDUTR #26 posted

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Mon Sep 17 2001 - 07:51:19 EDT


-----BEGIN PGP SIGNED MESSAGE-----

DougEwell2@cs.com wrote:
> In a message dated 2001-09-17 16:24:05 Pacific Daylight Time,
> david.hopwood@zetnet.co.uk writes:
>
> > It doesn't reopen that specific type of security hole, because irregular
> > UTF-8 sequences (as defined by Unicode 3.1) can only decode to characters
> > above 0xFFFF, and those characters are unlikely to be "special" for any
> > application protocol. However, I entirely agree that it's desirable that
> > UTF-8 should only allow shortest form; 6-byte surrogate encodings have
> > always been incorrect.
>
> All Unicode code points of the form U+xxxxFE and U+xxxxFF are special, in
> that they are non-characters and can be treated in a special way by
> applications (e.g. as sentinels).

Arguably that would be a bad idea, but in principle, yes.

> I don't agree that irregular UTF-8 sequences in general can only decode to
> characters above 0xFFFF.

That's why I specifically referred to irregular sequences as defined by
Unicode 3.1 (i.e. UAX #27).

> For example, the following irregular UTF-8
> sequences all decode to U+0000:
>
> C0 80
> E0 80 80
> F0 80 80 80
> F8 80 80 80 80
> FC 80 80 80 80 80

Those are illegal, not irregular, sequences according to UAX #27.

> It is true that the *specific* irregular UTF-8 sequences introduced (and
> required) by CESU-8 decode to characters above 0xFFFF when interpreted as
> CESU-8, and to pairs of surrogate code points when (incorrectly) interpreted
> as UTF-8. Since definition D29, arguably my least favorite part of Unicode,
> requires that all UTFs (including UTF-8) be able to represent unpaired
> surrogates, the character count for the same chunk of data could be different
> depending on whether it is interpreted as CESU-8 or UTF-8. That's a
> potential security hole.

Yes, it is (for decoders that accept Unicode 3.1-irregular sequences, and if
the application makes certain assumptions).

> CESU-8 decoders that are really diligent could check for this, of course, but
> when I think of CESU-8 the concept of "really diligent decoders" just doesn't
> spring to mind.

:-)

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO6XjmzkCAxeYt5gVAQEZPQgAxP3A/JtRV14WJ4mUurYpL6cRvMeF++uh
GCKjY5ehlt758DstDT1RYRXUYvohWWHzFfJP4vRcqlxfDaxVqmdcIXhAPfPCQk5P
AfKArWVYqARBawS1cIUg8ruuiXzS+4kbRfNE/mq+cZUzSQGIxrIHjWgHEAactRbC
UZsgvMDJlG96i2MVsnyKvat30bY+9fTQYoSQQnnOJrpVZJm590sZD2RTaAol02A2
/Pu6h0t4xI/1/ecCBGrOmpzNfw32L72SnjZWEawkCWe8w1exSZWY6hbkaU0H7d1D
WvZMPaHGnLRS16+Ffye4iYmRX2eyfwJKgwMhXnqdur3vGtmWUuYESA==
=yTJV
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Tue Sep 18 2001 - 09:59:03 EDT