These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Continual Learning (CL) for malware classification tackles the rapidly
evolving nature of malware threats and the frequent emergence of new types.
Generative Replay (GR)-based CL systems utilize a generative model to produce
synthetic versions of past data, which are then combined with new data to
retrain the primary model. Traditional machine learning techniques in this
domain often struggle with catastrophic forgetting, where a model's performance
on old data degrades over time.
In this paper, we introduce a GR-based CL system that employs Generative
Adversarial Networks (GANs) with feature matching loss to generate high-quality
malware samples. Additionally, we implement innovative selection schemes for
replay samples based on the model's hidden representations.
Our comprehensive evaluation across Windows and Android malware datasets in a
class-incremental learning scenario -- where new classes are introduced
continuously over multiple tasks -- demonstrates substantial performance
improvements over previous methods. For example, our system achieves an average
accuracy of 55% on Windows malware samples, significantly outperforming other
GR-based models by 28%. This study provides practical insights for advancing
GR-based malware classification systems. The implementation is available at
\url {https://github.com/MalwareReplayGAN/MalCL}\footnote{The code will be made
public upon the presentation of the paper}.