Deep Learning for Contextualized NetFlow-Based Network Intrusion Detection: Methods, Data, Evaluation and Deployment

Authors: Abdelkader El Mahdaouy, Issam Ait Yahia, Soufiane Oualil, Ismail Berrada | Published: 2026-02-05

2026.02.052026.02.07

Authors: Abdelkader El Mahdaouy, Issam Ait Yahia, Soufiane Oualil, Ismail Berrada
Published: 2026-02-05

Source: https://arxiv.org/abs/2602.05594

PDF: https://arxiv.org/pdf/2602.05594

Labels Predicted by AI

Anomaly Detection Graph Neural Network

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Network Intrusion Detection Systems (NIDS) have progressively shifted from signature-based techniques toward machine learning and, more recently, deep learning methods. Meanwhile, the widespread adoption of encryption has reduced payload visibility, weakening inspection pipelines that depend on plaintext content and increasing reliance on flow-level telemetry such as NetFlow and IPFIX. Many current learning-based detectors still frame intrusion detection as per-flow classification, implicitly treating each flow record as an independent sample. This assumption is often violated in realistic attack campaigns, where evidence is distributed across multiple flows and hosts, spanning minutes to days through staged execution, beaconing, lateral movement, and exfiltration. This paper synthesizes recent research on context-aware deep learning for flow-based intrusion detection. We organize existing methods into a four-dimensional taxonomy covering temporal context, graph or relational context, multimodal context, and multi-resolution context. Beyond modeling, we emphasize rigorous evaluation and operational realism. We review common failure modes that can inflate reported results, including temporal leakage, data splitting, dataset design flaws, limited dataset diversity, and weak cross-dataset generalization. We also analyze practical constraints that shape deployability, such as streaming state management, memory growth, latency budgets, and model compression choices. Overall, the literature suggests that context can meaningfully improve detection when attacks induce measurable temporal or relational structure, but the magnitude and reliability of these gains depend strongly on rigorous, causal evaluation and on datasets that capture realistic diversity.