Malware families discovery via Open-Set Recognition on Android manifest permissions

TOP Literature Database Malware families discovery via Open-Set Recognition on Android manifest permissions

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2505.12750

PDF

https://arxiv.org/pdf/2505.12750

Paper Information

Author: Filippo Leveni,Matteo Mistura,Francesco Iubatti,Carmine Giangregorio,Nicolò Pastore,Cesare Alippi,Giacomo Boracchi
Published: 5-19-2025
Affiliation: Department of Electronics, Information and Bioengineering. Politecnico di Milano - Milan, Italy
Country: Italy
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Malware Detection Method Dataset for Malware Classification Online Malware Detection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Malware are malicious programs that are grouped into families based on their penetration technique, source code, and other characteristics. Classifying malware programs into their respective families is essential for building effective defenses against cyber threats. Machine learning models have a huge potential in malware detection on mobile devices, as malware families can be recognized by classifying permission data extracted from Android manifest files. Still, the malware classification task is challenging due to the high-dimensional nature of permission data and the limited availability of training samples. In particular, the steady emergence of new malware families makes it impossible to acquire a comprehensive training set covering all the malware classes. In this work, we present a malware classification system that, on top of classifying known malware, detects new ones. In particular, we combine an open-set recognition technique developed within the computer vision community, namely MaxLogit, with a tree-based Gradient Boosting classifier, which is particularly effective in classifying high-dimensional data. Our solution turns out to be very practical, as it can be seamlessly employed in a standard classification workflow, and efficient, as it adds minimal computational overhead. Experiments on public and proprietary datasets demonstrate the potential of our solution, which has been deployed in a business environment.

External Datasets

Drebin

Proprietary dataset