Ordinary detections can only be fuzzy based on heuristics, those based on definitions match a certain specific signature, so they can't be fuzzy at all. They can be false, but that's not being fuzzy, that's simply a mistakenly marked signature.
I suppose your definition of "fuzzy" is different from mine.
Sure, matching a specific signature is somehow "strict" - but if, for example, the signature is (on purpose) chosen that weak that it's hard to predict what it's actually gonna detect, I'd call it "fuzzy".
The definitions system is aimed at reducing "fuzzyness", whereas the FileRep system, as described in the article, seems like a breeding ground for it.
I certainly can't agree with that. FileRep means more info, more info means less uncertainty / fuzziness.
I've made several clear points on why this is so, you haven't actually countered any of them directly with clear arguments. What you've done is pointed out a good mechanism of action not mentioned in the article; the lack of reputation, and tried to dilute the argument by claiming all detections are fuzzy.
I certainly haven't said that
all detections are fuzzy. But many detections are - they are based on heuristics, evaluating common properties, possibly not related to malicious behavior... but they still work.
The point is that the FileRep mechanism of action described in the article aren't just fuzzy, they're nonsensical. They are aimed at behaviors shared by both valid and malicious programs, rely on already implemented methods of protection existing outside the Avast product, or depend on falsifiable methods of verification.
You think they are nonsencial, I would say that if anybody thinks about them, they actually make a good sense, so we won't agree on that.
But even if they were nonsencial - who cares? We have huge sets of malicious and clean files and can easily measure whether the method works. And if it does (meaning it detects malicious files with little or no false positives, that can possibly be excluded somehow), even if it works based on strange data... it just works. The same is true for reputation-less heuristic rules.
I've noted that the sandboxing uses FileRep, but it uses it to decide which file to sandbox. From there the analysis is dependent on existing tools, heuristics and definitions. Based on FileRep's mechanism of action described in the article, there's hardly enough significant data for it to be used as an actual or conclusive heuristics engine, it's aim is too wide. So it only makes sense that in the end FileRep depends on sandboxing and standard heuristics and definitions, but the problem is that it's advertised to succeed where heuristics and definitions fail, and they're equally fallible whether they're tipped off by FileRep or standard scanning.
Again, I'm not saying FileRep is the only criteria for the decision, or that FileRep is a completely standalone thing that doesn't interact with the rest of the antivirus - that wouldn't make sense.
You seem to be fine with "heuristics" - which is a generic term that can cover mostly anything, but usually means something "fuzzy", decision/guess based on incomplete input data, but you refuse to accept that FileRep can be a useful additional piece of information for that very heuristic.
So FileRep is not mean to "win over heuristics" - its purpose it to improve the heuristics, and to make it possible to include new heuristic rules.
If your definition of heuristics has some very strict rules that don't include online queries... fine, then it's a new "uberheuristics" - but that's just terminology.
A file in the wild can either be identified on it's content, through some form of standard file descriptors already mentioned by FlyingRobot or on it's name. Now if the file changes it's content on a per-user basis, as mentioned in the article, the standard file descriptors (like hash) fail, and it can only be identified by it's name. Sure, i suppose the FileRep definition could be analyzing the complete file contents, and focusing on one unchanged aspect of it, which would be the malicious code surrounded by obfuscating information, but i highly doubt FileRep is doing that as it would be heavy on both system resources and data transfer.
The file is always identified by its hash - identification by its name is hardly relevant, you can rename your files however you wish.
The file is changed/unique for every user, the hash will also be unique for every user, nobody in the world has file with this hash ==> file is suspicious.
As for digital signatures, it's beside the point if i know how they're counterfeited, the point is that it's well documented that they are. And not only counterfeited but flat out stolen. They are used by many large software companies, true, but the thing is, even though the most used programs on my system are by large manufacturers, they're outnumbered by various small utilities which aren't digitally signed. Though large developers have the strongest presence, there's a larger amount of "off-brand" software out there that's not digitally signed. That's why relying on digital signatures means little to nothing.
Stolen, sure. Counterfeited... well, let's say we do our homework.
But again - various small utilities without a digital signature are somehow more suspicious that those big programs from large manufacturers. And this "little to nothing", even if it were so, is simply another little piece of information into the puzzle. And the puzzle (= the existing heuristic rules) are build on many similar pieces of "little to nothing", that's what the heuristics is about.
Anyway, I think I've spent a bit too much time trying to explain something, and I probably haven't made the point through, so I guess I'll rather do something productive instead. The FileRep system is new, and it will certainly evolve in the near future. We'll see how it goes - but so far, it's been doing pretty well (in terms of helping to discover unknown malware).