Labels Predicted by AI
Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.
Abstract
Continual Learning (CL) for malware classification tackles the rapidly evolving nature of malware threats and the frequent emergence of new types. Generative Replay (GR)-based CL systems utilize a generative model to produce synthetic versions of past data, which are then combined with new data to retrain the primary model. Traditional machine learning techniques in this domain often struggle with catastrophic forgetting, where a model’s performance on old data degrades over time. In this paper, we introduce a GR-based CL system that employs Generative Adversarial Networks (GANs) with feature matching loss to generate high-quality malware samples. Additionally, we implement innovative selection schemes for replay samples based on the model’s hidden representations. Our comprehensive evaluation across Windows and Android malware datasets in a class-incremental learning scenario – where new classes are introduced continuously over multiple tasks – demonstrates substantial performance improvements over previous methods. For example, our system achieves an average accuracy of 55 GR-based models by 28 GR-based malware classification systems. The implementation is available at https://github.com/MalwareReplayGAN/MalCL1.
The code will be made public upon the presentation of the paper↩︎