These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Recent growth and proliferation of malware have tested practitioners ability
to promptly classify new samples according to malware families. In contrast to
labor-intensive reverse engineering efforts, machine learning approaches have
demonstrated increased speed and accuracy. However, most existing deep-learning
malware family classifiers must be calibrated using a large number of samples
that are painstakingly manually analyzed before training. Furthermore, as novel
malware samples arise that are beyond the scope of the training set, additional
reverse engineering effort must be employed to update the training set. The
sheer volume of new samples found in the wild creates substantial pressure on
practitioners ability to reverse engineer enough malware to adequately train
modern classifiers. In this paper, we present MalMixer, a malware family
classifier using semi-supervised learning that achieves high accuracy with
sparse training data. We present a domain-knowledge-aware data augmentation
technique for malware feature representations, enhancing few-shot performance
of semi-supervised malware family classification. We show that MalMixer
achieves state-of-the-art performance in few-shot malware family classification
settings. Our research confirms the feasibility and effectiveness of
lightweight, domain-knowledge-aware data augmentation methods for malware
features and shows the capabilities of similar semi-supervised classifiers in
addressing malware classification issues.