Demystifying Behavior-Based Malware Detection at Endpoints

TOP Literature Database Demystifying Behavior-Based Malware Detection at Endpoints

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2405.06124

PDF

https://arxiv.org/pdf/2405.06124

Paper Information

Author: Yigitcan Kaya;Yizheng Chen;Shoumik Saha;Fabio Pierazzi;Lorenzo Cavallaro;David Wagner;Tudor Dumitras
Published: 5-10-2024
Affiliation: University of California, Santa Barbara
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Endpoint Detection Malware Classification High Difficulty Sample

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Machine learning is widely used for malware detection in practice. Prior behavior-based detectors most commonly rely on traces of programs executed in controlled sandboxes. However, sandbox traces are unavailable to the last line of defense offered by security vendors: malware detection at endpoints. A detector at endpoints consumes the traces of programs running on real-world hosts, as sandbox analysis might introduce intolerable delays. Despite their success in the sandboxes, research hints at potential challenges for ML methods at endpoints, e.g., highly variable malware behaviors. Nonetheless, the impact of these challenges on existing approaches and how their excellent sandbox performance translates to the endpoint scenario remain unquantified. We present the first measurement study of the performance of ML-based malware detectors at real-world endpoints. Leveraging a dataset of sandbox traces and a dataset of in-the-wild program traces; we evaluate two scenarios where the endpoint detector was trained on (i) sandbox traces (convenient and accessible); and (ii) endpoint traces (less accessible due to needing to collect telemetry data). This allows us to identify a wide gap between prior methods' sandbox-based detection performance--over 90%--and endpoint performances--below 20% and 50% in (i) and (ii), respectively. We pinpoint and characterize the challenges contributing to this gap, such as label noise, behavior variability, or sandbox evasion. To close this gap, we propose that yield a relative improvement of 5-30% over the baselines. Our evidence suggests that applying detectors trained on sandbox data to endpoint detection -- scenario (i) -- is challenging. The most promising direction is training detectors on endpoint data -- scenario (ii) -- which marks a departure from widespread practice. We implement a leaderboard for realistic detector evaluations to promote research.

External Datasets

endpoint dataset from Avllazagaj et al.

sandbox dataset from Tencent HABO

sandbox dataset from Cuckoo Sandbox