As machine-learning (ML) based systems for malware detection become more
prevalent, it becomes necessary to quantify the benefits compared to the more
traditional anti-virus (AV) systems widely used today. It is not practical to
build an agreed upon test set to benchmark malware detection systems on pure
classification performance. Instead we tackle the problem by creating a new
testing methodology, where we evaluate the change in performance on a set of
known benign & malicious files as adversarial modifications are performed. The
change in performance combined with the evasion techniques then quantifies a
system's robustness against that approach. Through these experiments we are
able to show in a quantifiable way how purely ML based systems can be more
robust than AV products at detecting malware that attempts evasion through
modification, but may be slower to adapt in the face of significantly novel
attacks.