These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
There is a limited amount of publicly available data to support research in
malware analysis technology. Particularly, there are virtually no publicly
available datasets generated from rich sandboxes such as Cuckoo/CAPE. The
benefit of using dynamic sandboxes is the realistic simulation of file
execution in the target machine and obtaining a log of such execution. The
machine can be infected by malware hence there is a good chance of capturing
the malicious behavior in the execution logs, thus allowing researchers to
study such behavior in detail. Although the subsequent analysis of log
information is extensively covered in industrial cybersecurity backends, to our
knowledge there has been only limited effort invested in academia to advance
such log analysis capabilities using cutting edge techniques. We make this
sample dataset available to support designing new machine learning methods for
malware detection, especially for automatic detection of generic malicious
behavior. The dataset has been collected in cooperation between Avast Software
and Czech Technical University - AI Center (AIC).