Benchmarking Android Malware Detection: Traditional vs. Deep Learning Models

Authors: Guojun Liu, Doina Caragea, Xinming Ou, Sankardas Roy | Published: 2025-02-20 | Updated: 2025-07-30

2025.02.20

Authors: Guojun Liu, Doina Caragea, Xinming Ou, Sankardas Roy
Published: 2025-02-20 | Updated: 2025-07-30

Source: https://arxiv.org/abs/2502.15041

PDF: https://arxiv.org/pdf/2502.15041

AIにより推定されたラベル

透かし技術データセットの影響レビューと調査

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Android malware detection has been extensively studied using both traditional machine learning (ML) and deep learning (DL) approaches. While many state-of-the-art detection models, particularly those based on DL, claim superior performance, they often rely on limited comparisons, lacking comprehensive benchmarking against traditional ML models across diverse datasets. This raises concerns about the robustness of DL-based approaches’ performance and the potential oversight of simpler, more efficient ML models. In this paper, we conduct a systematic evaluation of Android malware detection models across four datasets: three recently published, publicly available datasets and a large-scale dataset we systematically collected. We implement a range of traditional ML models, including Random Forests (RF) and CatBoost, alongside advanced DL models such as Capsule Graph Neural Networks (CapsGNN), BERT-based models, and ExcelFormer based models. Our results reveal that in many cases simpler and more computationally efficient ML models achieve comparable or even superior performance compared with DL models. These findings highlight the need for rigorous benchmarking in Android malware detection research. We encourage future studies to conduct more comprehensive benchmarking comparisons between traditional and advanced models to ensure a more accurate assessment of detection capabilities. To facilitate further research, we provide access to our dataset, including app IDs, hash values, and labels.