These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As cyber threats continue to grow in scale and sophistication, blue team
defenders increasingly require advanced tools to proactively detect and
mitigate risks. Large Language Models (LLMs) offer promising capabilities for
enhancing threat analysis. However, their effectiveness in real-world blue team
threat-hunting scenarios remains insufficiently explored. This paper presents
CyberTeam, a benchmark designed to guide LLMs in blue teaming practice.
CyberTeam constructs a standardized workflow in two stages. First, it models
realistic threat-hunting workflows by capturing the dependencies among
analytical tasks from threat attribution to incident response. Next, each task
is addressed through a set of operational modules tailored to its specific
analytical requirements. This transforms threat hunting into a structured
sequence of reasoning steps, with each step grounded in a discrete operation
and ordered according to task-specific dependencies. Guided by this framework,
LLMs are directed to perform threat-hunting tasks through modularized steps.
Overall, CyberTeam integrates 30 tasks and 9 operational modules to guide LLMs
through standardized threat analysis. We evaluate both leading LLMs and
state-of-the-art cybersecurity agents, comparing CyberTeam against open-ended
reasoning strategies. Our results highlight the improvements enabled by
standardized design, while also revealing the limitations of open-ended
reasoning in real-world threat hunting.
External Datasets
MITRE CVE database
NVD (National Vulnerability Database)
Exploit-DB
D3FEND
Oracle Security Alerts
Red Hat Bugzilla
RHSA (Red Hat Security Advisories)
IBM X-Force Exchange
CISE (Cybersecurity Information Sharing Environment)