VERCATION: Precise Vulnerable Open-source Software Version Identification based on Static Analysis and LLM

TOP Literature Database VERCATION: Precise Vulnerable Open-source Software Version Identification based on Static Analysis and LLM

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2408.07321

PDF

https://arxiv.org/pdf/2408.07321

Paper Information

Author: Yiran Cheng,Ting Zhang,Lwin Khin Shar,Shouguo Yang,Chaopeng Dong,David Lo,Shichao Lv,Zhiqiang Shi,Limin Sun
Published: 8-14-2024
Updated: 8-14-2025
Affiliation: Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering
Country: China
Conference: IEEE Trans. Software Eng.

Labels Estimated by AI

Code Change Analysis Prompt Injection Vulnerability Management

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and extract the code features involved in vulnerability patches using static analysis with pre-defined rules. They then use code clone detection to identify the vulnerable versions. These methods are hindered by imprecision due to (1) the exclusion of vulnerability-irrelevant code in the analysis and (2) the inadequacy of code clone detection. This paper presents VERCATION, an approach designed to identify vulnerable versions of OSS written in C/C++. VERCATION combines program slicing with a Large Language Model (LLM) to identify vulnerability-relevant code from vulnerability patches. It then backtracks historical commits to gather previous modifications of identified vulnerability-relevant code. We propose code clone detection based on expanded and normalized ASTs to compare the differences between pre-modification and post-modification code, thereby locating the vulnerability-introducing commit (vic) and enabling the identification of the vulnerable versions between the vulnerability-fixing commit and the vic. We curate a dataset linking 122 OSS vulnerabilities and 1,211 versions to evaluate VERCATION. On this dataset, our approach achieves an F1 score of 93.1%, outperforming current state-of-the-art methods. More importantly, VERCATION detected 202 incorrect vulnerable OSS versions in NVD reports.

External Datasets

122 OSS vulnerabilities and 1,211 versions