Re: Unicode and Security

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Feb 08 2002 - 00:10:41 EST


Elliotte Rusty Harold wrote:

> For past protocols like HTTP and URLs, we can plead ignorance and
> lack of imagination. We never knew how bad things were going to get.
> Now we do. We no longer have any excuses for knowingly designing
> systems that are open to spoofing, denial of service, or outright
> system cracking. Mistakes will of course continue to be made, but we
> have to try to make as few as possible and fix the problems where we
> can as soon as we can. There are legacy problems in HTTP, DNS, URLs,
> and many other systems; but when we're designing something truly new
> like internationalized domain names it only makes sense to avoid
> these known problems.

And I'm with you all the way to this point. Where we part company,
I think is at the implied "and so..."

If the basic requirements are that we find a way (for IDN) to
present meaningful strings to end users (note, not any natural
language phrase, but just a suitably contained, meaningful
subset thereof that users can live with) and then find a foolproof
way to map that to IP numbers, *and* that those meaningful
strings be truly internationalized and not just the current
restricted subset of ASCII, then we have a problem.

Either you have to more or less completely ignore the structure
and integrity of writing systems, and try to constrain down the
problem to a totally etic, psychological perception-based notion
of no visual confusion allowed in visible symbols to be
represented in strings, anywhere, anytime.

Or you have to admit that internationalizing the strings even
just the teensiest bit (e.g. allowing Cyrillic in the door along
with ASCII, or for that matter just allowing in accented Latin
letters along with ASCII) is going to increase the confusability
level in visible symbols used in strings.

The reductio ad absurdum of the first position is that allowing
even a single additional character in domain names, no matter
how distinct or innocuous, incrementally increases the opportunity
for confusion, spoofing, or other monkey business over the
current situation. So if we "no longer have any excuses" to
do anything that might knowingly increase the opportunity for
security holes, then logically, we should just shut down the
whole IDN effort and proclaim to the world, "Let them eat ASCII!"

Heck, it doesn't even have to be close to visual confusability to
cause a problem. What if IDN allowed just two Han characters
in, and nothing else, and those Han characters were for nihon
(Japanese for Japan). Then somebody could register Microsoft<nihon>.com
and never mind the naive user -- the knowledgable, biliterate
English/Japanese user could be spoofed into thinking that was
Microsoft's Japan division, instead of Trojans 'R Us.

I think that rather than coming to the Unicode list to
proclaim "Unicode is a security risk! The sky is falling!"
the better way to conceive this is that globalization of the
IT infrastructure of the world is a difficult business that
presents many new possibilities for security risks if
internationalization of existing protocols and the handling
of textual data from around the world is not done carefully.

If the customers of the Internet are demanding that it be
internationalized better that it currently is (and I believe
they are), and if part of that internationalization is responding
to demands that Japan be able to have Japanese domain names,
China have Chinese domain names, etc., as I believe it is, then
we just have to come to grips with the complexity of
text handling that that implies. And in turn that means
that just as years ago system programmers learned to their
chagrin that their systems broke because they had been
doing casemapping with c -= 0x20 assignments, so Internet
protocol developers are going to have to learn that their
security is broken if it depends on the structure and
constraints of ASCII, or on the use of small glyph sets where
all the glyphs are visually distinct from each other.

--Ken



This archive was generated by hypermail 2.1.2 : Thu Feb 07 2002 - 23:42:01 EST