AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents

TOP Literature Database AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2505.10321

PDF

https://arxiv.org/pdf/2505.10321

Paper Information

Author: Julius Henke
Published: 5-15-2025
Affiliation: University of Amsterdam
Country: Netherlands
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

RAG LLM Security Indirect Prompt Injection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

A recent area of increasing research is the use of Large Language Models (LLMs) in penetration testing, which promises to reduce costs and thus allow for higher frequency. We conduct a review of related work, identifying best practices and common evaluation issues. We then present AutoPentest, an application for performing black-box penetration tests with a high degree of autonomy. AutoPentest is based on the LLM GPT-4o from OpenAI and the LLM agent framework LangChain. It can perform complex multi-step tasks, augmented by external tools and knowledge bases. We conduct a study on three capture-the-flag style Hack The Box (HTB) machines, comparing our implementation AutoPentest with the baseline approach of manually using the ChatGPT-4o user interface. Both approaches are able to complete 15-25 % of the subtasks on the HTB machines, with AutoPentest slightly outperforming ChatGPT. We measure a total cost of \$96.20 US when using AutoPentest across all experiments, while a one-month subscription to ChatGPT Plus costs \$20. The results show that further implementation efforts and the use of more powerful LLMs released in the future are likely to make this a viable part of vulnerability management.

External Datasets

Hack The Box (HTB) machines

Devvortex

Broker

Codify