Organizations are increasingly targeted by Advanced Persistent Threats
(APTs), which involve complex, multi-stage tactics and diverse techniques.
Cyber Threat Intelligence (CTI) sources, such as incident reports and security
blogs, provide valuable insights, but are often unstructured and in natural
language, making it difficult to automatically extract information. Recent
studies have explored the use of AI to perform automatic extraction from CTI
data, leveraging existing CTI datasets for performance evaluation and
fine-tuning. However, they present challenges and limitations that impact their
effectiveness. To overcome these issues, we introduce a novel dataset manually
constructed from CTI reports and structured according to the MITRE ATT&CK
framework. To assess its quality, we conducted an inter-annotator agreement
study using Krippendorff alpha, confirming its reliability. Furthermore, the
dataset was used to evaluate a Large Language Model (LLM) in a real-world
business context, showing promising generalizability.