Enhancing Network Intrusion Detection Systems (NIDS) with supervised Machine
Learning (ML) is tough. ML-NIDS must be trained and evaluated, operations
requiring data where benign and malicious samples are clearly labelled. Such
labels demand costly expert knowledge, resulting in a lack of real deployments,
as well as on papers always relying on the same outdated data. The situation
improved recently, as some efforts disclosed their labelled datasets. However,
most past works used such datasets just as a 'yet another' testbed, overlooking
the added potential provided by such availability.
In contrast, we promote using such existing labelled data to cross-evaluate
ML-NIDS. Such approach received only limited attention and, due to its
complexity, requires a dedicated treatment. We hence propose the first
cross-evaluation model. Our model highlights the broader range of realistic
use-cases that can be assessed via cross-evaluations, allowing the discovery of
still unknown qualities of state-of-the-art ML-NIDS. For instance, their
detection surface can be extended--at no additional labelling cost. However,
conducting such cross-evaluations is challenging. Hence, we propose the first
framework, XeNIDS, for reliable cross-evaluations based on Network Flows. By
using XeNIDS on six well-known datasets, we demonstrate the concealed
potential, but also the risks, of cross-evaluations of ML-NIDS.