These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Program analysis tools often produce large volumes of candidate vulnerability
reports that require costly manual review, creating a practical challenge: how
can security analysts prioritize the reports most likely to be true
vulnerabilities?
This paper investigates whether machine learning can be applied to
prioritizing vulnerabilities reported by program analysis tools. We focus on
Node.js packages and collect a benchmark of 1,883 Node.js packages, each
containing one reported ACE or ACI vulnerability. We evaluate a variety of
machine learning approaches, including classical models, graph neural networks
(GNNs), large language models (LLMs), and hybrid models that combine GNN and
LLMs, trained on data based on a dynamic program analysis tool's output. The
top LLM achieves $F_{1} {=} 0.915$, while the best GNN and classical ML models
reaching $F_{1} {=} 0.904$. At a less than 7% false-negative rate, the leading
model eliminates 66.9% of benign packages from manual review, taking around 60
ms per package. If the best model is tuned to operate at a precision level of
0.8 (i.e., allowing 20% false positives amongst all warnings), our approach can
detect 99.2% of exploitable taint flows while missing only 0.8%, demonstrating
strong potential for real-world vulnerability triage.