Software vulnerabilities, caused by unintentional flaws in source code, are a
primary root cause of cyberattacks. Static analysis of source code has been
widely used to detect these unintentional defects introduced by software
developers. Large Language Models (LLMs) have demonstrated human-like
conversational abilities due to their capacity to capture complex patterns in
sequential data, such as natural languages. In this paper, we harness LLMs'
capabilities to analyze source code and detect known vulnerabilities. To ensure
the proposed vulnerability detection method is universal across multiple
programming languages, we convert source code to LLVM IR and train LLMs on
these intermediate representations. We conduct extensive experiments on various
LLM architectures and compare their accuracy. Our comprehensive experiments on
real-world and synthetic codes from NVD and SARD demonstrate high accuracy in
identifying source code vulnerabilities.
外部データセット
NVD
SARD
参考文献
22nd IEEE International Conference on Software Quality, Reliability, and Security (QRS 2022)
A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities
A. Mahyari
Published: 2022
IEEE Transactions on Dependable and Secure Computing
Vuldeelocator: a deep learning-based fine-grained vulnerability detector
Li, Z., Zou, D., Xu, S., Chen, Z., Zhu, Y., Jin, H.