These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Cyber-attack attribution is an important process that allows experts to put
in place attacker-oriented countermeasures and legal actions. The analysts
mainly perform attribution manually, given the complex nature of this task. AI
and, more specifically, Natural Language Processing (NLP) techniques can be
leveraged to support cybersecurity analysts during the attribution process.
However powerful these techniques are, they need to deal with the lack of
datasets in the attack attribution domain. In this work, we will fill this gap
and will provide, to the best of our knowledge, the first dataset on
cyber-attack attribution. We designed our dataset with the primary goal of
extracting attack attribution information from cybersecurity texts, utilizing
named entity recognition (NER) methodologies from the field of NLP. Unlike
other cybersecurity NER datasets, ours offers a rich set of annotations with
contextual details, including some that span phrases and sentences. We
conducted extensive experiments and applied NLP techniques to demonstrate the
dataset's effectiveness for attack attribution. These experiments highlight the
potential of Large Language Models (LLMs) capabilities to improve the NER tasks
in cybersecurity datasets for cyber-attack attribution.