Saturday, April 12, 2003

DPR at 9:19 AM [url]:

TIA - the new ESP research?

The TIA program at DARPA continues apace.

Someone needs to subject these claims being made for pervasive data gathering and inference to scientific review.

I am reminded of my brief (childhood, 14 years old) correspondence with Dr. J.B. Rhine, the foremost ESP investigator nearly 40 years ago. I was introduced to much of what I know about statistical methods by reading his papers and books from his fellow researchers. There were amazingly well-documented statistical proofs that information was being conveyed over a new ether. Experiments were designed to demonstrate that such information could be sent backwards in time, and the speed of such communications was carefully measured.

Of course (and this will get me flamed, I'm sure), these incredibly careful statistical analyses were being applied to experimental data that was flawed in very serious ways, gamed by the subjects who had incredible incentive to confirm the investigators' hypotheses, etc.

When I got a little older, I realized that the sophisticated statistics were obscuring understanding, as much as helping it, in this case. The search for sophisticated statistical methods seemed to be driven by a need to find some way to extract the "right" results out of noise (those that confirmed the experimenters' beliefs).

The DoD, by the way, supported a bunch of ESP research that tended to confirm the potential of ESP in predicting behaviors of our cold war enemies.

Now this idea of extracting reliable and meaningful information from massive data collection arises. A scientist might ask, what would falsify the underlying hypothesis? Is there a null hypothesis at all?

In fact, the privacy and liberty folks, by expressing concern in the form of risks to "privacy" tend to reinforce the belief that there is any real investigatory information that can be extracted by inference from a very noisy and randomly selected pile of information.

The problem with statistical inference is that it is not neutral with respect to hypotheses you are testing, nor with respect to control of the sampling process.

What we are asked to believe is that the work being done by DARPA (on clean data sets, looking for evidence to prove hypotheses about behavior that cannot be validated against real behavioral experiments) can provide any clue about the use of such technology in the very specific context of the real world activities being observed, and the hypotheses being tested.

This entire field of mining uncontrolled data, and inferencing, is quite analogous to the ESP enterprise in this sense. (it would be laughable as GIGO if it weren't taken so seriously...

And I am afraid that the country is unable to understand that the so-called scientists (including Adm. Poindexter) who are leading this are about as clueless as the ESP researchers were, as to their biases, etc. Clever computer science, even powerful and correct computer science, will serve the same role in this process that the powerful statistical methods served in the Dr. Rhine's ESP research enterprise. The math was not wrong... but it helped create a delusion.

The result in the TIA case will be very dangerous pseudo-scientific bullshit, I suspect. Unfortunately it will be turned on us. I hope the computer science participants working on the data mining and inferencing tools don't expect a pass because they were "just following orders".

Monday, April 07, 2003

BobF at 3:38 PM [url]:

Implementing VisiCalc

In preparation for the Computer History Museum's The Origins and Impact of VisiCalc panel I wrote about my experience implementing VisiCalc. It's about writing bytes of code and fitting it within about 20KB which is the amount of space a small thumbnail might take on a web page. But it's not about the code itself. The code was always just a means of creating a product and it succeeds if program itself disappears and people simply "connect" with the task at hand.

