Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

TOP 文献データベース Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2604.05719

PDF

https://arxiv.org/pdf/2604.05719

文献情報

作者: Jiaren Peng,Zeqin Li,Chang You,Yan Wang,Hanlin Sun,Xuan Tian,Shuqiao Zhang,Junyi Liu,Jianguo Zhao,Renyang Liu,Haoran Ou,Yuqiang Sun,Jiancheng Zhang,Yutong Jiao,Kunshu Song,Chao Zhang,Fan Shi,Hongda Sun,Rui Yan,Cheng Huang
公開日: 2026-4-7
所属機関: School of Cyber Science and Engineering, Sichuan University
所属の国: China
会議名

AIにより推定されたラベル

RAG フレームワーク RAGへのポイズニング攻撃

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

The rapid advancement of Large Language Models (LLMs) has created new opportunities for Automated Penetration Testing (AutoPT), spawning numerous frameworks aimed at achieving end-to-end autonomous attacks. However, despite the proliferation of related studies, existing research generally lacks systematic architectural analysis and large-scale empirical comparisons under a unified benchmark. Therefore, this paper presents the first Systematization of Knowledge (SoK) focusing on the architectural design and comprehensive empirical evaluation of current LLM-based AutoPT frameworks. At systematization level, we comprehensively review existing framework designs across six dimensions: agent architecture, agent plan, agent memory, agent execution, external knowledge, and benchmarks. At empirical level, we conduct large-scale experiments on 13 representative open-source AutoPT frameworks and 2 baseline frameworks utilizing a unified benchmark. The experiments consumed over 10 billion tokens in total and generated more than 1,500 execution logs, which were manually reviewed and analyzed over four months by a panel of more than 15 researchers with expertise in cybersecurity. By investigating the latest progress in this rapidly developing field, we provide researchers with a structured taxonomy to understand existing LLM-based AutoPT frameworks and a large-scale empirical benchmark, along with promising directions for future research.