These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Directed greybox fuzzing (DGF) aims to efficiently trigger bugs at specific
target locations by prioritizing seeds whose execution paths are more likely to
mutate into triggering target bugs. However, existing DGF approaches suffer
from imprecise probability calculations due to their reliance on complex
distance metrics derived from static analysis. The over-approximations inherent
in static analysis cause a large number of irrelevant execution paths to be
mistakenly considered to potentially mutate into triggering target bugs,
significantly reducing fuzzing efficiency. We propose to replace static
analysis-based distance metrics with precise call stack representations. Call
stacks represent precise control flows, thereby avoiding false information in
static analysis. We leverage large language models (LLMs) to predict
vulnerability-triggering call stacks for guiding seed prioritization. Our
approach constructs call graphs through static analysis to identify methods
that can potentially reach target locations, then utilizes LLMs to predict the
most likely call stack sequence that triggers the vulnerability. Seeds whose
execution paths have higher overlap with the predicted call stack are
prioritized for mutation. This is the first work to integrate LLMs into the
core seed prioritization mechanism of DGF. We implement our approach and
evaluate it against several state-of-the-art fuzzers. On a suite of real-world
programs, our approach triggers vulnerabilities $1.86\times$ to $3.09\times$
faster compared to baselines. In addition, our approach identifies 10 new
vulnerabilities and 2 incomplete fixes in the latest versions of programs used
in our controlled experiments through directed patch testing, with 10 assigned
CVE IDs.