These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
In the constantly evolving field of cybersecurity, it is imperative for
analysts to stay abreast of the latest attack trends and pertinent information
that aids in the investigation and attribution of cyber-attacks. In this work,
we introduce the first question-answering (QA) model and its application that
provides information to the cybersecurity experts about cyber-attacks
investigations and attribution. Our QA model is based on Retrieval Augmented
Generation (RAG) techniques together with a Large Language Model (LLM) and
provides answers to the users' queries based on either our knowledge base (KB)
that contains curated information about cyber-attacks investigations and
attribution or on outside resources provided by the users. We have tested and
evaluated our QA model with various types of questions, including KB-based,
metadata-based, specific documents from the KB, and external sources-based
questions. We compared the answers for KB-based questions with those from
OpenAI's GPT-3.5 and the latest GPT-4o LLMs. Our proposed QA model outperforms
OpenAI's GPT models by providing the source of the answers and overcoming the
hallucination limitations of the GPT models, which is critical for cyber-attack
investigation and attribution. Additionally, our analysis showed that when the
RAG QA model is given few-shot examples rather than zero-shot instructions, it
generates better answers compared to cases where no examples are supplied in
addition to the query.