www.satn.org

Project MAC, where we met S at MIT A the Software Arts building where we worked together T and the attic N where VisiCalc was written
Other writings on our personal sites:

Bob's
David's
Dans's
RSS Feeds:

SATN

Bob

Dan
Comments from Frankston, Reed, and Friends

Sunday, October 08, 2006

BobF at 8:27 PM [url]:

The Internet as Design Principle

People tend to think of the Internet in terms of the Web but the real importance is in the basic End-to-End philosophy. You build applications at the edge of the network and don't make unnecessary assumptions about what's in the middle.

Two news items reminded me of the gap between traditional practice and what is possible.

The first was a television show about a series of burglaries. The technique was simple -- cut the alarm and phone wires so that the robbery can't be reported.

The other is the report about the recent mid-air collision in Brazil. According to the article "A police official who has interviewed the American crew said that, in the last known voice communication with the Legacy, the pilots had been told to switch radio frequencies as they entered the jurisdiction of a different air traffic control center. But the official said they had misheard the frequency and failed to tune their radio correctly."

What is striking is that both systems violate the simplest of design principles.

There are historic reasons for these design decisions but there's no excuse for using early 20th century methodology in such critical situations. Why doesn't the alarm system constantly report its status? It's not just that the system is vulnerable to having the wire cut; the absence of a signal should be indicate a failure but it's treated as a nonevent. Worse, since the system isn't constantly tested you can never be sure it's operational. Given that there is a dedicated alarm wire it seems fairly trivially to use it as an IP and send regular status messages.

The idea of changing frequencies makes me think of old movies with pilots looking at the stars to find their position. Even if we are using primitive frequency based systems how come there isn't some monitoring to assure there is a functional communications path.

I've focused only on fairly simple aspects of these systems and the stories may not be accurate. Given an IP connection one can do far more than send a simple alarm signal. There are books such as Fatal Words by Steve Cushing which go into far more details about the short comings of signaling for aircraft.

What is striking about these particular examples is that there is little excuse for not solving them in isolation without waiting for a grand redesign. Why would anyone with a modicum of understanding of security rely on open loop signaling? Why would those responsible for the lives of airline passengers (and pilots) omit a simple safety check like assuring there is a signal path?

Perhaps I'm aware of these because of my experience in dealing with complex and thus unreliable systems. The power of software-based systems is that one can learn from experience and capture this learning. The Internet comes from this world -- you first have to protect from your own mistakes and assure they don't propagate. The basic end-to-end principle of the Internet allows us to design (relatively) reliable systems using unreliable components. By assuming the transport is unreliable we get a more reliable systems design by dealing with failures are common occurrences so we get practice in dealing with theme. One lesson is that when things go wrong the alarm system itself is likely to fail.

These incidents are a reminder that such thinking is not yet the norm -- we are still deathly afraid of failure and focus on prevent it. If we are able to take failures in stride we are much less vulnerable. We can't deal with all eventualities but there is no excuse for not taking responsibility for systems design.

It doesn't make sense that something as simple as cutting an alarm wire will defeat the system. I assume that the best systems don't have this problem but this open loop signaling seems to still be the norm.

If people die because a pilot failed to change the frequency, the fault lies with the design of the systems. The pilot is as much a victim of bad design as the passenger. Well, maybe a little more liable -- after all, the pilots have tolerated such negligent design rather than protesting.

The Internet is not the Web, that's just an implication. The Internet is a demonstration of the power of resilient design and the power of verifying rather than simply trusting the behavior of others.




For more, see the Archive.

© Copyright 2002-2008 by Daniel Bricklin, Bob Frankston, and David P. Reed
All Rights Reserved.

Comments to: webmaster at satn.org, danb at satn.org, bobf at satn.org, or dpreed at satn.org.

The weblog part of this web site is authored with Blogger.