These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The extensive damage caused by malware requires anti-malware systems to be
constantly improved to prevent new threats. The current trend in malware
detection is to employ machine learning models to aid in the classification
process. We propose a new dataset with the objective of improving current
anti-malware systems. The focus of this dataset is to improve host based
intrusion detection systems by providing API call sequences for thousands of
malware samples executed in Windows 10 virtual machines. A tutorial on how to
create and expand this dataset is provided along with a benchmark demonstrating
how to use this dataset to classify malware. The data contains long sequences
of API calls for each sample, and in order to create models that can be
deployed in resource constrained devices, three feature selection methods were
tested. The principal innovation, however, lies in the multi-label
classification system in which one sequence of APIs can be tagged with multiple
labels describing its malicious behaviours.