TOP Literature Database On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities
Computing Research Repository (CoRR)
On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities
AI Security Portal bot
Information in the literature database is collected automatically.
These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As the smartphone market leader, Android has been a prominent target for
malware attacks. The number of malicious applications (apps) identified for it
has increased continually over the past decade, creating an immense challenge
for all parties involved. For market holders and researchers, in particular,
the large number of samples has made manual malware detection unfeasible,
leading to an influx of research that investigate Machine Learning (ML)
approaches to automate this process. However, while some of the proposed
approaches achieve high performance, rapidly evolving Android malware has made
them unable to maintain their accuracy over time. This has created a need in
the community to conduct further research, and build more flexible ML
pipelines. Doing so, however, is currently hindered by a lack of systematic
overview of the existing literature, to learn from and improve upon the
existing solutions. Existing survey papers often focus only on parts of the ML
process (e.g., data collection or model deployment), while omitting other
important stages, such as model evaluation and explanation. In this paper, we
address this problem with a review of 42 highly-cited papers, spanning a decade
of research (from 2011 to 2021). We introduce a novel procedural taxonomy of
the published literature, covering how they have used ML algorithms, what
features they have engineered, which dimensionality reduction techniques they
have employed, what datasets they have employed for training, and what their
evaluation and explanation strategies are. Drawing from this taxonomy, we also
identify gaps in knowledge and provide ideas for improvement and future work.