Generative Active Adaptation for Drifting and Imbalanced Network Intrusion Detection

Authors: Ragini Gupta, Shinan Liu, Ruixiao Zhang, Xinyue Hu, Xiaoyang Wang, Hadjer Benkraouda, Pranav Kommaraju, Nick Feamster, Klara Nahrstedt | Published: 2025-03-04 | Updated: 2025-08-13

2025.03.042025.08.15

Authors: Ragini Gupta, Shinan Liu, Ruixiao Zhang, Xinyue Hu, Xiaoyang Wang, Hadjer Benkraouda, Pranav Kommaraju, Nick Feamster, Klara Nahrstedt
Published: 2025-03-04 | Updated: 2025-08-13

Source: https://arxiv.org/abs/2503.03022

PDF: https://arxiv.org/pdf/2503.03022

Labels Predicted by AI

Class Imbalance Active Learning Data Augmentation Method

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Machine learning has shown promise in network intrusion detection systems, yet its performance often degrades due to concept drift and imbalanced data. These challenges are compounded by the labor-intensive process of labeling network traffic, especially when dealing with evolving and rare attack types, which makes preparing the right data for adaptation difficult. To address these issues, we propose a generative active adaptation framework that minimizes labeling effort while enhancing model robustness. Our approach employs density-aware dataset prior selection to identify the most informative samples for annotation, and leverages deep generative models to conditionally synthesize diverse samples, thereby augmenting the training set and mitigating the effects of concept drift. We evaluate our end-to-end framework on both simulated IDS data and a real-world ISP dataset, demonstrating significant improvements in intrusion detection performance. Our method boosts the overall F1-score from 0.60 (without adaptation) to 0.86. Rare attacks such as Infiltration, Web Attack, and FTP-BruteForce, which originally achieved F1 scores of 0.001, 0.04, and 0.00, improve to 0.30, 0.50, and 0.71, respectively, with generative active adaptation in the CIC-IDS 2018 dataset. Our framework effectively enhances rare attack detection while reducing labeling costs, making it a scalable and practical solution for intrusion detection.