These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Sharing methods of attack and their effectiveness is a cornerstone of
building robust defensive systems. Threat analysis reports, produced by various
individuals and organizations, play a critical role in supporting security
operations and combating emerging threats. To enhance the timeliness and
automation of threat intelligence sharing, several standards have been
established, with the Structured Threat Information Expression (STIX) framework
emerging as one of the most widely adopted. However, generating STIX-compatible
data from unstructured security text remains a largely manual, expert-driven
process. To address this challenge, we introduce AZERG, a tool designed to
assist security analysts in automatically generating structured STIX
representations. To achieve this, we adapt general-purpose large language
models for the specific task of extracting STIX-formatted threat data. To
manage the complexity, the task is divided into four subtasks: entity detection
(T1), entity type identification (T2), related pair detection (T3), and
relationship type identification (T4). We apply task-specific fine-tuning to
accurately extract relevant entities and infer their relationships in
accordance with the STIX specification. To address the lack of training data,
we compiled a comprehensive dataset with 4,011 entities and 2,075 relationships
extracted from 141 full threat analysis reports, all annotated in alignment
with the STIX standard. Our models achieved F1-scores of 84.43% for T1, 88.49%
for T2, 95.47% for T3, and 84.60% for T4 in real-world scenarios. We validated
their performance against a range of open- and closed-parameter models, as well
as state-of-the-art methods, demonstrating improvements of 2-25% across tasks.