Completely

If nothing else, the selection of samples is wrong. It isn't an easy thing to do; you can't take a few antivirus programs and call anything detected by at least one of them a virus. This way:
1. The programs used for such selections get an immediate advantage
2. You are ignoring the viruses that the other A/V programs detect and the selected groups doesn't
3. Any false positive (or corrupted sample), detected by one of the selected A/Vs, is immediatelly counted as a loss for the other ones. So, the testbed may actually be a big folder of garbage here.
You can't get meaningful results from flawed samples...