These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Penetration testing, a crucial industrial practice for ensuring system
security, has traditionally resisted automation due to the extensive expertise
required by human professionals. Large Language Models (LLMs) have shown
significant advancements in various domains, and their emergent abilities
suggest their potential to revolutionize industries. In this research, we
evaluate the performance of LLMs on real-world penetration testing tasks using
a robust benchmark created from test machines with platforms. Our findings
reveal that while LLMs demonstrate proficiency in specific sub-tasks within the
penetration testing process, such as using testing tools, interpreting outputs,
and proposing subsequent actions, they also encounter difficulties maintaining
an integrated understanding of the overall testing scenario.
In response to these insights, we introduce PentestGPT, an LLM-empowered
automatic penetration testing tool that leverages the abundant domain knowledge
inherent in LLMs. PentestGPT is meticulously designed with three
self-interacting modules, each addressing individual sub-tasks of penetration
testing, to mitigate the challenges related to context loss. Our evaluation
shows that PentestGPT not only outperforms LLMs with a task-completion increase
of 228.6\% compared to the \gptthree model among the benchmark targets but also
proves effective in tackling real-world penetration testing challenges. Having
been open-sourced on GitHub, PentestGPT has garnered over 4,700 stars and
fostered active community engagement, attesting to its value and impact in both
the academic and industrial spheres.