Breaking Out from the TESSERACT: Reassessing ML-based Malware Detection under Spatio-Temporal Drift

TOP Literature Database Breaking Out from the TESSERACT: Reassessing ML-based Malware Detection under Spatio-Temporal Drift

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2506.23814

PDF

https://arxiv.org/pdf/2506.23814

Paper Information

Author: Theo Chow,Mario D'Onghia,Lorenz Linhardt,Zeliang Kan,Daniel Arp,Lorenzo Cavallaro,Fabio Pierazzi
Published: 6-30-2025
Affiliation: King's College London
Country: United Kingdom
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Dataset for Malware Classification Evaluation Metrics Bias

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Several recent works focused on the best practices for applying machine learning to cybersecurity. In the context of malware, TESSERACT highlighted the impact of concept drift on detection performance and suggested temporal and spatial constraints to be enforced to ensure realistic time-aware evaluations, which have been adopted by the community. In this paper, we demonstrate striking discrepancies in the performance of learning-based malware detection across the same time frame when evaluated on two representative Android malware datasets used in top-tier security conferences, both adhering to established sampling and evaluation guidelines. This questions our ability to understand how current state-of-the-art approaches would perform in realistic scenarios. To address this, we identify five novel temporal and spatial bias factors that affect realistic evaluations. We thoroughly evaluate the impact of these factors in the Android malware domain on two representative datasets and five Android malware classifiers used or proposed in top-tier security conferences. For each factor, we provide practical and actionable recommendations that the community should integrate in their methodology for more realistic and reproducible settings.

External Datasets

AndroZoo

APIGraph

Transcendent