Malware remains a serious problem for corporations, government agencies, and
individuals, as attackers continue to use it as a tool to effect frequent and
costly network intrusions. Machine learning holds the promise of automating the
work required to detect newly discovered malware families, and could
potentially learn generalizations about malware and benign software that
support the detection of entirely new, unknown malware families. Unfortunately,
few proposed machine learning based malware detection methods have achieved the
low false positive rates required to deliver deployable detectors.
In this paper we a deep neural network malware classifier that achieves a
usable detection rate at an extremely low false positive rate and scales to
real world training example volumes on commodity hardware. Specifically, we
show that our system achieves a 95% detection rate at 0.1% false positive rate
(FPR), based on more than 400,000 software binaries sourced directly from our
customers and internal malware databases. We achieve these results by directly
learning on all binaries, without any filtering, unpacking, or manually
separating binary files into categories. Further, we confirm our false positive
rates directly on a live stream of files coming in from Invincea's deployed
endpoint solution, provide an estimate of how many new binary files we expected
to see a day on an enterprise network, and describe how that relates to the
false positive rate and translates into an intuitive threat score.
Our results demonstrate that it is now feasible to quickly train and deploy a
low resource, highly accurate machine learning classification model, with false
positive rates that approach traditional labor intensive signature based
methods, while also detecting previously unseen malware.
外部データセット
Invincea's in-house database of malicious and benign binaries
live feed from Invincea customer networks
live feed from Jotti subscription threat intelligence feed