Nowadays, malware and malware incidents are increasing daily, even with
various antivirus systems and malware detection or classification
methodologies. Machine learning techniques have been the main focus of the
security experts to detect malware and determine their families. Many static,
dynamic, and hybrid techniques have been presented for that purpose. In this
study, the static analysis technique has been applied to malware samples to
extract API calls, which is one of the most used features in machine/deep
learning models as it represents the behavior of malware samples.
Since the rapid increase and continuous evolution of malware affect the
detection capacity of antivirus scanners, recent and updated datasets of
malicious software became necessary to overcome this drawback. This paper
introduces two new datasets: One with 14,616 samples obtained and compiled from
VirusShare and one with 9,795 samples from VirusSample. In addition, benchmark
results based on static API calls of malware samples are presented using
several machine and deep learning models on these datasets. We believe that
these two datasets and benchmark results enable researchers to test and
validate their methods and approaches in this field.