These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Background: Automated Vulnerability Repair (AVR) is a fast-growing branch of
program repair. Recent studies show that large language models (LLMs)
outperform traditional techniques, extending their success beyond code
generation and fault detection.
Hypothesis: These gains may be driven by hidden factors -- "invisible hands"
such as training-data leakage or perfect fault localization -- that let an LLM
reproduce human-authored fixes for the same code.
Objective: We replicate prior AVR studies under controlled conditions by
deliberately adding errors to the reported vulnerability location in the
prompt. If LLMs merely regurgitate memorized fixes, both small and large
localization errors should yield the same number of correct patches, because
any offset should divert the model from the original fix.
Method: Our pipeline repairs vulnerabilities from the Vul4J and VJTrans
benchmarks after shifting the fault location by n lines from the ground truth.
A first LLM generates a patch, a second LLM reviews it, and we validate the
result with regression and proof-of-vulnerability tests. Finally, we manually
audit a sample of patches and estimate the error rate with the
Agresti-Coull-Wilson method.