These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Software vulnerabilities are a serious and crucial concern. Typically, in a
program or function consisting of hundreds or thousands of source code
statements, there are only a few statements causing the corresponding
vulnerabilities. Most current approaches to vulnerability labelling are done on
a function or program level by experts with the assistance of machine learning
tools. Extending this approach to the code statement level is much more costly
and time-consuming and remains an open problem. In this paper, we propose a
novel end-to-end deep learning-based approach to identify the
vulnerability-relevant code statements of a specific function. Inspired by the
specific structures observed in real-world vulnerable code, we first leverage
mutual information for learning a set of latent variables representing the
relevance of the source code statements to the corresponding function's
vulnerability. We then propose novel clustered spatial contrastive learning in
order to further improve the representation learning and the robust selection
process of vulnerability-relevant code statements. Experimental results on
real-world datasets of 200k+ C/C++ functions show the superiority of our method
over other state-of-the-art baselines. In general, our method obtains a higher
performance in VCP, VCA, and Top-10 ACC measures of between 3% to 14% over the
baselines when running on real-world datasets in an unsupervised setting. Our
released source code samples are publicly available at
\href{https://github.com/vannguyennd/livuitcl}{https://github.com/vannguyennd/livuitcl.}