Recurrent Neural Networks (RNNs) yield attractive properties for constructing
Intrusion Detection Systems (IDSs) for network data. With the rise of
ubiquitous Machine Learning (ML) systems, malicious actors have been catching
up quickly to find new ways to exploit ML vulnerabilities for profit. Recently
developed adversarial ML techniques focus on computer vision and their
applicability to network traffic is not straightforward: Network packets expose
fewer features than an image, are sequential and impose several constraints on
their features.
We show that despite these completely different characteristics, adversarial
samples can be generated reliably for RNNs. To understand a classifier's
potential for misclassification, we extend existing explainability techniques
and propose new ones, suitable particularly for sequential data. Applying them
shows that already the first packets of a communication flow are of crucial
importance and are likely to be targeted by attackers. Feature importance
methods show that even relatively unimportant features can be effectively
abused to generate adversarial samples. Since traditional evaluation metrics
such as accuracy are not sufficient for quantifying the adversarial threat, we
propose the Adversarial Robustness Score (ARS) for comparing IDSs, capturing a
common notion of adversarial robustness, and show that an adversarial training
procedure can significantly and successfully reduce the attack surface.