These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large language models (LLMs) can be used to analyze cyber threat intelligence
(CTI) data from cybercrime forums, which contain extensive information and key
discussions about emerging cyber threats. However, to date, the level of
accuracy and efficiency of LLMs for such critical tasks has yet to be
thoroughly evaluated. Hence, this study assesses the performance of an LLM
system built on the OpenAI GPT-3.5-turbo model [8] to extract CTI information.
To do so, a random sample of more than 700 daily conversations from three
cybercrime forums - XSS, Exploit_in, and RAMP - was extracted, and the LLM
system was instructed to summarize the conversations and predict 10 key CTI
variables, such as whether a large organization and/or a critical
infrastructure is being targeted, with only simple human-language instructions.
Then, two coders reviewed each conversation and evaluated whether the
information extracted by the LLM was accurate. The LLM system performed well,
with an average accuracy score of 96.23%, an average precision of 90% and an
average recall of 88.2%. Various ways to enhance the model were uncovered, such
as the need to help the LLM distinguish between stories and past events, as
well as being careful with verb tenses in prompts. Nevertheless, the results of
this study highlight the relevance of using LLMs for cyber threat intelligence.