These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Malware are malicious programs that are grouped into families based on their
penetration technique, source code, and other characteristics. Classifying
malware programs into their respective families is essential for building
effective defenses against cyber threats. Machine learning models have a huge
potential in malware detection on mobile devices, as malware families can be
recognized by classifying permission data extracted from Android manifest
files. Still, the malware classification task is challenging due to the
high-dimensional nature of permission data and the limited availability of
training samples. In particular, the steady emergence of new malware families
makes it impossible to acquire a comprehensive training set covering all the
malware classes. In this work, we present a malware classification system that,
on top of classifying known malware, detects new ones. In particular, we
combine an open-set recognition technique developed within the computer vision
community, namely MaxLogit, with a tree-based Gradient Boosting classifier,
which is particularly effective in classifying high-dimensional data. Our
solution turns out to be very practical, as it can be seamlessly employed in a
standard classification workflow, and efficient, as it adds minimal
computational overhead. Experiments on public and proprietary datasets
demonstrate the potential of our solution, which has been deployed in a
business environment.