Abstract
Large Language Models (LLMs) are set to reshape cybersecurity by augmenting
red and blue team operations. Red teams can exploit LLMs to plan attacks, craft
phishing content, simulate adversaries, and generate exploit code. Conversely,
blue teams may deploy them for threat intelligence synthesis, root cause
analysis, and streamlined documentation. This dual capability introduces both
transformative potential and serious risks.
This position paper maps LLM applications across cybersecurity frameworks
such as MITRE ATT&CK and the NIST Cybersecurity Framework (CSF), offering a
structured view of their current utility and limitations. While LLMs
demonstrate fluency and versatility across various tasks, they remain fragile
in high-stakes, context-heavy environments. Key limitations include
hallucinations, limited context retention, poor reasoning, and sensitivity to
prompts, which undermine their reliability in operational settings.
Moreover, real-world integration raises concerns around dual-use risks,
adversarial misuse, and diminished human oversight. Malicious actors could
exploit LLMs to automate reconnaissance, obscure attack vectors, and lower the
technical threshold for executing sophisticated attacks.
To ensure safer adoption, we recommend maintaining human-in-the-loop
oversight, enhancing model explainability, integrating privacy-preserving
mechanisms, and building systems robust to adversarial exploitation. As
organizations increasingly adopt AI driven cybersecurity, a nuanced
understanding of LLMs' risks and operational impacts is critical to securing
their defensive value while mitigating unintended consequences.