Smart contract vulnerabilities have led to billions in losses, yet finding
actionable exploits remains challenging. Traditional fuzzers rely on rigid
heuristics and struggle with complex attacks, while human auditors are thorough
but slow and don't scale. Large Language Models offer a promising middle
ground, combining human-like reasoning with machine speed.
However, early studies show that simply prompting LLMs generates unverified
vulnerability speculations with high false positive rates. To address this, we
present A1, an agentic system that transforms any LLM into an end-to-end
exploit generator. A1 provides agents with six domain-specific tools for
autonomous vulnerability discovery, from understanding contract behavior to
testing strategies on real blockchain states. All outputs are concretely
validated through execution, ensuring only profitable proof-of-concept exploits
are reported. We evaluate A1 across 36 real-world vulnerable contracts on
Ethereum and Binance Smart Chain. A1 achieves a 63% success rate on the VERITE
benchmark. Across all successful cases, A1 extracts up to \$8.59 million per
exploit and \$9.33 million total. Through 432 experiments across six LLMs, we
show that most exploits emerge within five iterations, with costs ranging
\$0.01-\$3.59 per attempt.
Using Monte Carlo analysis of historical attacks, we demonstrate that
immediate vulnerability detection yields 86-89% success probability, dropping
to 6-21% with week-long delays. Our economic analysis reveals a troubling
asymmetry: attackers achieve profitability at \$6,000 exploit values while
defenders require \$60,000 -- raising fundamental questions about whether AI
agents inevitably favor exploitation over defense.