Abstract
With the rise of large language models, such as ChatGPT, non-decisional
models have been applied to various tasks. Moreover, ChatGPT has drawn
attention to the traditional decision-centric task of Android malware
detection. Despite effective detection methods proposed by scholars, they face
low interpretability issues. Specifically, while these methods excel in
classifying applications as benign or malicious and can detect malicious
behavior, they often fail to provide detailed explanations for the decisions
they make. This challenge raises concerns about the reliability of existing
detection schemes and questions their true ability to understand complex data.
In this study, we investigate the influence of the non-decisional model,
ChatGPT, on the traditional decision-centric task of Android malware detection.
We choose three state-of-the-art solutions, Drebin, XMAL, and MaMaDroid,
conduct a series of experiments on publicly available datasets, and carry out a
comprehensive comparison and analysis. Our findings indicate that these
decision-driven solutions primarily rely on statistical patterns within
datasets to make decisions, rather than genuinely understanding the underlying
data. In contrast, ChatGPT, as a non-decisional model, excels in providing
comprehensive analysis reports, substantially enhancing interpretability.
Furthermore, we conduct surveys among experienced developers. The result
highlights developers' preference for ChatGPT, as it offers in-depth insights
and enhances efficiency and understanding of challenges. Meanwhile, these
studies and analyses offer profound insights, presenting developers with a
novel perspective on Android malware detection--enhancing the reliability of
detection results from a non-decisional perspective.