These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Deep learning (DL) has proven to be effective in detecting sophisticated
malware that is constantly evolving. Even though deep learning has alleviated
the feature engineering problem, finding the most optimal DL model, in terms of
neural architecture search (NAS) and the model's optimal set of
hyper-parameters, remains a challenge that requires domain expertise. In
addition, many of the proposed state-of-the-art models are very complex and may
not be the best fit for different datasets. A promising approach, known as
Automated Machine Learning (AutoML), can reduce the domain expertise required
to implement a custom DL model. AutoML reduces the amount of human
trial-and-error involved in designing DL models, and in more recent
implementations can find new model architectures with relatively low
computational overhead.
This work provides a comprehensive analysis and insights on using AutoML for
static and online malware detection. For static, our analysis is performed on
two widely used malware datasets: SOREL-20M to demonstrate efficacy on large
datasets; and EMBER-2018, a smaller dataset specifically curated to hinder the
performance of machine learning models. In addition, we show the effects of
tuning the NAS process parameters on finding a more optimal malware detection
model on these static analysis datasets. Further, we also demonstrate that
AutoML is performant in online malware detection scenarios using Convolutional
Neural Networks (CNNs) for cloud IaaS. We compare an AutoML technique to six
existing state-of-the-art CNNs using a newly generated online malware dataset
with and without other applications running in the background during malware
execution.In general, our experimental results show that the performance of
AutoML based static and online malware detection models are on par or even
better than state-of-the-art models or hand-designed models presented in
literature.