Network Intrusion Detection Systems (NIDS) are a fundamental tool in
cybersecurity. Their ability to generalize across diverse networks is a
critical factor in their effectiveness and a prerequisite for real-world
applications. In this study, we conduct a comprehensive analysis on the
generalization of machine-learning-based NIDS through an extensive
experimentation in a cross-dataset framework. We employ four machine learning
classifiers and utilize four datasets acquired from different networks:
CIC-IDS-2017, CSE-CIC-IDS2018, LycoS-IDS2017, and LycoS-Unicas-IDS2018.
Notably, the last dataset is a novel contribution, where we apply corrections
based on LycoS-IDS2017 to the well-known CSE-CIC-IDS2018 dataset. The results
show nearly perfect classification performance when the models are trained and
tested on the same dataset. However, when training and testing the models in a
cross-dataset fashion, the classification accuracy is largely commensurate with
random chance except for a few combinations of attacks and datasets. We employ
data visualization techniques in order to provide valuable insights on the
patterns in the data. Our analysis unveils the presence of anomalies in the
data that directly hinder the classifiers capability to generalize the learned
knowledge to new scenarios. This study enhances our comprehension of the
generalization capabilities of machine-learning-based NIDS, highlighting the
significance of acknowledging data heterogeneity.