Malware detection plays a vital role in computer security. Modern machine
learning approaches have been centered around domain knowledge for extracting
malicious features. However, many potential features can be used, and it is
time consuming and difficult to manually identify the best features, especially
given the diverse nature of malware.
In this paper, we propose Neurlux, a neural network for malware detection.
Neurlux does not rely on any feature engineering, rather it learns
automatically from dynamic analysis reports that detail behavioral information.
Our model borrows ideas from the field of document classification, using word
sequences present in the reports to predict if a report is from a malicious
binary or not. We investigate the learned features of our model and show which
components of the reports it tends to give the highest importance. Then, we
evaluate our approach on two different datasets and report formats, showing
that Neurlux improves on the state of the art and can effectively learn from
the dynamic analysis reports. Furthermore, we show that our approach is
portable to other malware analysis environments and generalizes to different
datasets.