These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Several recent works focused on the best practices for applying machine
learning to cybersecurity. In the context of malware, TESSERACT highlighted the
impact of concept drift on detection performance and suggested temporal and
spatial constraints to be enforced to ensure realistic time-aware evaluations,
which have been adopted by the community. In this paper, we demonstrate
striking discrepancies in the performance of learning-based malware detection
across the same time frame when evaluated on two representative Android malware
datasets used in top-tier security conferences, both adhering to established
sampling and evaluation guidelines. This questions our ability to understand
how current state-of-the-art approaches would perform in realistic scenarios.
To address this, we identify five novel temporal and spatial bias factors that
affect realistic evaluations. We thoroughly evaluate the impact of these
factors in the Android malware domain on two representative datasets and five
Android malware classifiers used or proposed in top-tier security conferences.
For each factor, we provide practical and actionable recommendations that the
community should integrate in their methodology for more realistic and
reproducible settings.