From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction

TOP Literature Database From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2507.16576

PDF

https://arxiv.org/pdf/2507.16576

Paper Information

Author: Ahmed Lekssays,Husrev Taha Sencar,Ting Yu
Published: 7-22-2025
Affiliation: Qatar Computing Research Institute
Country: Qatar
Conference: International Symposium on Recent Advances in Intrusion Detection (RAID)

Labels Estimated by AI

Indirect Prompt Injection Threat modeling Attack Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Sharing methods of attack and their effectiveness is a cornerstone of building robust defensive systems. Threat analysis reports, produced by various individuals and organizations, play a critical role in supporting security operations and combating emerging threats. To enhance the timeliness and automation of threat intelligence sharing, several standards have been established, with the Structured Threat Information Expression (STIX) framework emerging as one of the most widely adopted. However, generating STIX-compatible data from unstructured security text remains a largely manual, expert-driven process. To address this challenge, we introduce AZERG, a tool designed to assist security analysts in automatically generating structured STIX representations. To achieve this, we adapt general-purpose large language models for the specific task of extracting STIX-formatted threat data. To manage the complexity, the task is divided into four subtasks: entity detection (T1), entity type identification (T2), related pair detection (T3), and relationship type identification (T4). We apply task-specific fine-tuning to accurately extract relevant entities and infer their relationships in accordance with the STIX specification. To address the lack of training data, we compiled a comprehensive dataset with 4,011 entities and 2,075 relationships extracted from 141 full threat analysis reports, all annotated in alignment with the STIX standard. Our models achieved F1-scores of 84.43% for T1, 88.49% for T2, 95.47% for T3, and 84.60% for T4 in real-world scenarios. We validated their performance against a range of open- and closed-parameter models, as well as state-of-the-art methods, demonstrating improvements of 2-25% across tasks.

External Datasets

AZERG Data

AnnoCTRPlus