Machine Learning (ML) techniques are becoming an invaluable support for
network intrusion detection, especially in revealing anomalous flows, which
often hide cyber-threats. Typically, ML algorithms are exploited to
classify/recognize data traffic on the basis of statistical features such as
inter-arrival times, packets length distribution, mean number of flows, etc.
Dealing with the vast diversity and number of features that typically
characterize data traffic is a hard problem. This results in the following
issues: i) the presence of so many features leads to lengthy training processes
(particularly when features are highly correlated), while prediction accuracy
does not proportionally improve; ii) some of the features may introduce bias
during the classification process, particularly those that have scarce relation
with the data traffic to be classified. To this end, by reducing the feature
space and retaining only the most significant features, Feature Selection (FS)
becomes a crucial pre-processing step in network management and, specifically,
for the purposes of network intrusion detection. In this review paper, we
complement other surveys in multiple ways: i) evaluating more recent datasets
(updated w.r.t. obsolete KDD 99) by means of a designed-from-scratch
Python-based procedure; ii) providing a synopsis of most credited FS approaches
in the field of intrusion detection, including Multi-Objective Evolutionary
techniques; iii) assessing various experimental analyses such as feature
correlation, time complexity, and performance. Our comparisons offer useful
guidelines to network/security managers who are considering the incorporation
of ML concepts into network intrusion detection, where trade-offs between
performance and resource consumption are crucial.