Re: Security concerns: OGHAM SPACE MARK from David Starner on 2015-07-21 (Unicode Mail List Archive)

From: David Starner <prosfilaes_at_gmail.com>
Date: Tue, 21 Jul 2015 23:29:33 +0000

On Tue, Jul 21, 2015 at 2:55 PM Dreiheller, Albrecht <
albrecht.dreiheller_at_siemens.com> wrote:

> My concern is not about the Ogham space, but about the free usage of
non-Ascii in programming languages in general.

> Just imagine, when you decide to open a door for public traffic in busy
city with a security check point, you wouldn't consider only how to check a
single person; instead, you have to consider how you would check thousands
of people within one hour, if you don’t plan to close the door again.

There is no way to check thousands of people in an hour through a door
that's a security check point. That's why few places have security check
points. That's comparable; it's very hard to check any significant body of
code at any speed, so it's a rare issue.

> Therefore, consider a huge software system written developed in, let's
say, Serbia or Russia using Cyrillic names throughout for classes and
variables.

> int ци́фра = чита́ть(пе́речень); return ци́фра;

Then do what you need to do. Transliterate the Serbian characters, see if
it works any differently. The language (in any character set) is going to
be a large barrier for a lot of audiences, but that's what it is.

> Looking for a deliberate attempt to confuse within this code would be
like looking for a needle in a haystack, since every line has non-Ascii in
it.

Looking for a deliberate attempt to confuse in code is like looking for a
needle in a haystack. If those two lines shown in my last post had been
hidden in a million line kernel, they would have been rather hard to find,
particularly if the kernel wasn't warning-clean.

> I used a term "exclusion rules", meaning a ruleset bases on the
confusables list.

First step probably is implement it as a lint type program. Then discuss it
with the compiler writers of the languages you're worried about. As I've
said above, I don't see this as a huge concern for most real-life programs,
since the attack surface is huge.

> With "black-listed" I meant "known to be unsafe" in some way.

I.e. Javascript. C. C++. A huge amount of existing and still-in-use code is
written in C, whose buffer overruns are a notorious source of security
holes. It seems like a much better candidate to be black-listed, if anyone
was capable of such.

> The fathers of ALGOL and other early languages racked their brain to
avoid ambigous semantics caused by poor syntax rules.

Published examples of ALGOL 60 are unreadable, and very hard to verify
correctness; a modern reader will generally have to start by reformatting
the code, and then replacing GOTOs with loops and ifs, and finding better
variable names, if they want to know what's going on.

We've increased code clarity hugely, but reading large amounts of code is
still hard, hard enough that I see stressing about deliberate deception to
be a narrow market.

This is not something that really needs language support; it can be done in
compilers and editors and lint-type programs without that support.

>
Received on Tue Jul 21 2015 - 18:30:41 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 21 2015 - 18:30:41 CDT