These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Open-source software (OSS) has experienced a surge in popularity, attributed
to its collaborative development model and cost-effective nature. However, the
adoption of specific software versions in development projects may introduce
security risks when these versions bring along vulnerabilities. Current methods
of identifying vulnerable versions typically analyze and extract the code
features involved in vulnerability patches using static analysis with
pre-defined rules. They then use code clone detection to identify the
vulnerable versions. These methods are hindered by imprecision due to (1) the
exclusion of vulnerability-irrelevant code in the analysis and (2) the
inadequacy of code clone detection. This paper presents VERCATION, an approach
designed to identify vulnerable versions of OSS written in C/C++. VERCATION
combines program slicing with a Large Language Model (LLM) to identify
vulnerability-relevant code from vulnerability patches. It then backtracks
historical commits to gather previous modifications of identified
vulnerability-relevant code. We propose code clone detection based on expanded
and normalized ASTs to compare the differences between pre-modification and
post-modification code, thereby locating the vulnerability-introducing commit
(vic) and enabling the identification of the vulnerable versions between the
vulnerability-fixing commit and the vic. We curate a dataset linking 122 OSS
vulnerabilities and 1,211 versions to evaluate VERCATION. On this dataset, our
approach achieves an F1 score of 93.1%, outperforming current state-of-the-art
methods. More importantly, VERCATION detected 202 incorrect vulnerable OSS
versions in NVD reports.