As software becomes increasingly complex and prone to vulnerabilities,
automated vulnerability detection is critically important, yet challenging.
Given the significant successes of large language models (LLMs) in various
tasks, there is growing anticipation of their efficacy in vulnerability
detection. However, a quantitative understanding of their potential in
vulnerability detection is still missing. To bridge this gap, we introduce a
comprehensive vulnerability benchmark VulBench. This benchmark aggregates
high-quality data from a wide range of CTF (Capture-the-Flag) challenges and
real-world applications, with annotations for each vulnerable function
detailing the vulnerability type and its root cause. Through our experiments
encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models
and static analyzers, we find that several LLMs outperform traditional deep
learning approaches in vulnerability detection, revealing an untapped potential
in LLMs. This work contributes to the understanding and utilization of LLMs for
enhanced software security.