BTW, perhaps some would be interested in Ed's Own Version of Firewall (HIPS) Leak Testing?
First you embed the 150 or so Matousec cases into perhaps 15000 cases that are not malware, but trigger at least one of the HIPS checkpoints. Then you wire Matousec's testicles to the computer and start the test. He now knows that the a priori probability that any sequence of popups is malware is only .01. But wait, when I studied decision theory in school you needed to worry about the relative cost of misses and false alarms. So let's be generous, and say that a false alarm zaps him with 100v, and a miss costs 600v to start. Then run him through the test for a score. Then re-randomize the order of the cases and try again with another HIPS. When finished, on to subject #2. I think this gives the tester some vested interest, like a real user would have, might allow adjustment of the voltage to give the best overall score depending on your decision metric, and could eventually lead to a confidence factor to help the user decide. And provides a more valid comparison and guidelines than the current procedure-scientific method, afterall.
