PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities

TOP Literature Database PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2510.11688

PDF

https://arxiv.org/pdf/2510.11688

Paper Information

Author: Zicheng Liu,Lige Huang,Jie Zhang,Dongrui Liu,Yuan Tian,Jing Shao
Published: 10-14-2025
Affiliation: Shanghai Artificial Intelligence Laboratory
Country: China
Conference

Labels Estimated by AI

Security Analysis Method Large Language Model Defense Mechanism

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The increasing autonomy of Large Language Models (LLMs) necessitates a rigorous evaluation of their potential to aid in cyber offense. Existing benchmarks often lack real-world complexity and are thus unable to accurately assess LLMs' cybersecurity capabilities. To address this gap, we introduce PACEbench, a practical AI cyber-exploitation benchmark built on the principles of realistic vulnerability difficulty, environmental complexity, and cyber defenses. Specifically, PACEbench comprises four scenarios spanning single, blended, chained, and defense vulnerability exploitations. To handle these complex challenges, we propose PACEagent, a novel agent that emulates human penetration testers by supporting multi-phase reconnaissance, analysis, and exploitation. Extensive experiments with seven frontier LLMs demonstrate that current models struggle with complex cyber scenarios, and none can bypass defenses. These findings suggest that current models do not yet pose a generalized cyber offense threat. Nonetheless, our work provides a robust benchmark to guide the trustworthy development of future models.