I'm afraid there are quite a few - just like for other AVs.
Basically, 15 samples doesn't bring any useful information. You can always easily select 15 (or 150, doesn't matter) samples such that a particular AV fails for all of them, or vice versa, detects all of them. So, the result is only about the sample selection.
If there was a million samples (which would be kinda hard to dynamically test all of them, sure), that would be a bit harder to manipulate the sample set. But with this low number, you have to believe that the samples were chosen randomly... and even if they were, there's no guarantee they match the usual statistical distribution; can be related to the geographical origin of the samples/tester, etc.