LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature

TOP Literature Database LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2312.07110

PDF

https://arxiv.org/pdf/2312.07110

Paper Information

Author: Maxime Würsch;Andrei Kucharavy;Dimitri Percia David;Alain Mermoud
Published: 12-12-2023
Affiliation: Cyber-Defence Campus, armasuisse S+T
Country: Switzerland
Conference

Labels Estimated by AI

LLM Performance Evaluation Knowledge Extraction Method Data Preprocessing

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The cybersecurity landscape evolves rapidly and poses threats to organizations. To enhance resilience, one needs to track the latest developments and trends in the domain. It has been demonstrated that standard bibliometrics approaches show their limits in such a fast-evolving domain. For this purpose, we use large language models (LLMs) to extract relevant knowledge entities from cybersecurity-related texts. We use a subset of arXiv preprints on cybersecurity as our data and compare different LLMs in terms of entity recognition (ER) and relevance. The results suggest that LLMs do not produce good knowledge entities that reflect the cybersecurity context, but our results show some potential for noun extractors. For this reason, we developed a noun extractor boosted with some statistical analysis to extract specific and relevant compound nouns from the domain. Later, we tested our model to identify trends in the LLM domain. We observe some limitations, but it offers promising results to monitor the evolution of emergent trends.

External Datasets

arXiv preprints (100k)

BookCorpus

INSPEC dataset subsample