These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Smart contract vulnerabilities caused significant economic losses in
blockchain applications. Large Language Models (LLMs) provide new possibilities
for addressing this time-consuming task. However, state-of-the-art LLM-based
detection solutions are often plagued by high false-positive rates.
In this paper, we push the boundaries of existing research in two key ways.
First, our evaluation is based on Solidity v0.8, offering the most up-to-date
insights compared to prior studies that focus on older versions (v0.4). Second,
we leverage the latest five LLM models (across companies), ensuring
comprehensive coverage across the most advanced capabilities in the field.
We conducted a series of rigorous evaluations. Our experiments demonstrate
that a well-designed prompt can reduce the false-positive rate by over 60%.
Surprisingly, we also discovered that the recall rate for detecting some
specific vulnerabilities in Solidity v0.8 has dropped to just 13% compared to
earlier versions (i.e., v0.4). Further analysis reveals the root cause of this
decline: the reliance of LLMs on identifying changes in newly introduced
libraries and frameworks during detection.