Large Language Models (LLMs) have revolutionised natural language processing
tasks, particularly as chat agents. However, their applicability to threat
detection problems remains unclear. This paper examines the feasibility of
employing LLMs as a Network Intrusion Detection System (NIDS), despite their
high computational requirements, primarily for the sake of explainability.
Furthermore, considerable resources have been invested in developing LLMs, and
they may offer utility for NIDS. Current state-of-the-art NIDS rely on
artificial benchmarking datasets, resulting in skewed performance when applied
to real-world networking environments. Therefore, we compare the GPT-4 and
LLama3 models against traditional architectures and transformer-based models to
assess their ability to detect malicious NetFlows without depending on
artificially skewed datasets, but solely on their vast pre-trained acquired
knowledge. Our results reveal that, although LLMs struggle with precise attack
detection, they hold significant potential for a path towards explainable NIDS.
Our preliminary exploration shows that LLMs are unfit for the detection of
Malicious NetFlows. Most promisingly, however, these exhibit significant
potential as complementary agents in NIDS, particularly in providing
explanations and aiding in threat response when integrated with Retrieval
Augmented Generation (RAG) and function calling capabilities.
外部データセット
NF-UNSW-NB15-v2
NF-CSE-CIC-IDS2018-v2
参考文献
Benchmarking the Benchmark – Analysis of Synthetic NIDS Datasets