These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine Learning (ML) has become a valuable asset to solve many real-world
tasks. For Network Intrusion Detection (NID), however, scientific advances in
ML are still seen with skepticism by practitioners. This disconnection is due
to the intrinsically limited scope of research papers, many of which primarily
aim to demonstrate new methods ``outperforming'' prior work -- oftentimes
overlooking the practical implications for deploying the proposed solutions in
real systems. Unfortunately, the value of ML for NID depends on a plethora of
factors, such as hardware, that are often neglected in scientific literature.
This paper aims to reduce the practitioners' skepticism towards ML for NID by
"changing" the evaluation methodology adopted in research. After elucidating
which "factors" influence the operational deployment of ML in NID, we propose
the notion of "pragmatic assessment", which enable practitioners to gauge the
real value of ML methods for NID. Then, we show that the state-of-research
hardly allows one to estimate the value of ML for NID. As a constructive step
forward, we carry out a pragmatic assessment. We re-assess existing ML methods
for NID, focusing on the classification of malicious network traffic, and
consider: hundreds of configuration settings; diverse adversarial scenarios;
and four hardware platforms. Our large and reproducible evaluations enable
estimating the quality of ML for NID. We also validate our claims through a
user-study with security practitioners.
External Datasets
CTU13
NB15
UF-NB15
CICIDS17
GTCS
References
Nature Digital Medicine
Deep learning-enabled medical computer vision
A. Esteva
Published: 2021
IEEE Int. Symp. High-Perf. Comp. Arch.
Machine learning at Facebook: Understanding inference at the edge
C.-J. Wu
Published: 2019
IEEE Transactions on Neural Networks and Learning Systems
A survey of the usages of deep learning for natural language processing
Daniel W Otter, Julian R Medina, Jugal K Kalita
Published: 2020
ICML
Deep Speech 2: End-to-end speech recognition in English and Mandarin
D. Amodei
Published: 2016
Medical Image Analysis
A survey on deep learning in medical image analysis
Network Intrusion Detection and Comparative Analysis using Ensemble Machine Learning and Feature Selection
S. Das
Published: 2021
Int. Workshop Multiple Classifier Syst.
One-and-a-half-class multiple classifier systems for secure learning against evasion attacks at test time
B. Biggio
Published: 2015
J. Supercomput.
An efficient cascaded method for network intrusion detection based on extreme learning machines
Y. Yu
Published: 2018
Computer Networks
Internet of things: A survey on machine learning-based intrusion detection approaches
K. A. Da Costa, J. P. Papa, C. O. Lisboa, R. Munoz, V. H. C. de Albuquerque
Published: 2019
IEEE Access
AI-IDS: Application of deep learning to real-time Web intrusion detection
A. Kim
Published: 2020
arxiv
Cited by 1
Reviewer Integration and Performance Measurement for Malware Detection
Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar, Tony Wu, George Yiu, Anthony D. Joseph, J. D. Tygar
Published: 10.26.2015
We present and evaluate a large-scale malware detection system integrating
machine learning with expert reviewers, treating reviewers as a limited
labeling resource. We demonstrate that even in small numbers, reviewers can
vastly improve the system's ability to keep pace with evolving threats. We
conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years
and containing 1.1 million binaries with 778GB of raw feature data. Without
reviewer assistance, we achieve 72% detection at a 0.5% false positive rate,
performing comparable to the best vendors on VirusTotal. Given a budget of 80
accurate reviews daily, we improve detection to 89% and are able to detect 42%
of malicious binaries undetected upon initial submission to VirusTotal.
Additionally, we identify a previously unnoticed temporal inconsistency in the
labeling of training datasets. We compare the impact of training labels
obtained at the same time training data is first seen with training labels
obtained months later. We find that using training labels obtained well after
samples appear, and thus unavailable in practice for current training data,
inflates measured detection by almost 20 percentage points. We release our
cluster-based implementation, as well as a list of all hashes in our evaluation
and 3% of our entire dataset.