These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Despite the promising results of machine learning models in malicious files
detection, they face the problem of concept drift due to their constant
evolution. This leads to declining performance over time, as the data
distribution of the new files differs from the training one, requiring frequent
model update. In this work, we propose a model-agnostic protocol to improve a
baseline neural network against drift. We show the importance of feature
reduction and training with the most recent validation set possible, and
propose a loss function named Drift-Resilient Binary Cross-Entropy, an
improvement to the classical Binary Cross-Entropy more effective against drift.
We train our model on the EMBER dataset, published in2018, and evaluate it on a
dataset of recent malicious files, collected between 2020 and 2023. Our
improved model shows promising results, detecting 15.2% more malware than a
baseline model.