These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Textual descriptions in cyber threat intelligence (CTI) reports, such as
security articles and news, are rich sources of knowledge about cyber threats,
crucial for organizations to stay informed about the rapidly evolving threat
landscape. However, current CTI knowledge extraction methods lack flexibility
and generalizability, often resulting in inaccurate and incomplete knowledge
extraction. Syntax parsing relies on fixed rules and dictionaries, while model
fine-tuning requires large annotated datasets, making both paradigms
challenging to adapt to new threats and ontologies. To bridge the gap, we
propose CTINexus, a novel framework leveraging optimized in-context learning
(ICL) of large language models (LLMs) for data-efficient CTI knowledge
extraction and high-quality cybersecurity knowledge graph (CSKG) construction.
Unlike existing methods, CTINexus requires neither extensive data nor parameter
tuning and can adapt to various ontologies with minimal annotated examples.
This is achieved through: (1) a carefully designed automatic prompt
construction strategy with optimal demonstration retrieval for extracting a
wide range of cybersecurity entities and relations; (2) a hierarchical entity
alignment technique that canonicalizes the extracted knowledge and removes
redundancy; (3) an long-distance relation prediction technique to further
complete the CSKG with missing links. Our extensive evaluations using 150
real-world CTI reports collected from 10 platforms demonstrate that CTINexus
significantly outperforms existing methods in constructing accurate and
complete CSKG, highlighting its potential to transform CTI analysis with an
efficient and adaptable solution for the dynamic threat landscape.