These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
For a long time, malware classification and analysis have been an arms-race
between antivirus systems and malware authors. Though static analysis is
vulnerable to evasion techniques, it is still popular as the first line of
defense in antivirus systems. But most of the static analyzers failed to gain
the trust of practitioners due to their black-box nature. We propose MAlign, a
novel static malware family classification approach inspired by genome sequence
alignment that can not only classify malware families but can also provide
explanations for its decision. MAlign encodes raw bytes using nucleotides and
adopts genome sequence alignment approaches to create a signature of a malware
family based on the conserved code segments in that family, without any human
labor or expertise. We evaluate MAlign on two malware datasets, and it
outperforms other state-of-the-art machine learning based malware classifiers
(by 4.49% - 0.07%), especially on small datasets (by 19.48% - 1.2%).
Furthermore, we explain the generated signatures by MAlign on different malware
families illustrating the kinds of insights it can provide to analysts, and
show its efficacy as an analysis tool. Additionally, we evaluate its
theoretical and empirical robustness against some common attacks. In this
paper, we approach static malware analysis from a unique perspective, aiming to
strike a delicate balance among performance, interpretability, and robustness.
External Datasets
Kaggle Microsoft Malware Classification Challenge (Big 2015)
Microsoft Machine Learning Security Evasion Competition (2020)