These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Supervised machine learning techniques rely on labeled data to achieve high
task performance, but this requires the labels to capture some meaningful
differences in the underlying data structure. For training network intrusion
detection algorithms, most datasets contain a series of attack classes and a
single large benign class which captures all non-attack network traffic. A
review of intrusion detection papers and guides that explicitly state their
data preprocessing steps identified that the majority took the labeled
categories of the dataset at face value when training their algorithms. The
present paper evaluates the structure of benign traffic in several common
intrusion detection datasets (NSL-KDD, UNSW-NB15, and CIC-IDS 2017) and
determines whether there are meaningful sub-categories within this traffic
which may improve overall multi-classification performance using common machine
learning techniques. We present an overview of some unsupervised clustering
techniques (e.g., HDBSCAN, Mean Shift Clustering) and show how they
differentially cluster the benign traffic space.