Black-box scanners have played a significant role in detecting
vulnerabilities for web applications. A key focus in current black-box scanning
is increasing test coverage (i.e., accessing more web pages). However, since
many web applications are user-oriented, some deep pages can only be accessed
through complex user interactions, which are difficult to reach by existing
black-box scanners. To fill this gap, a key insight is that web pages contain a
wealth of semantic information that can aid in understanding potential user
intention. Based on this insight, we propose Hoyen, a black-box scanner that
uses the Large Language Model to predict user intention and provide guidance
for expanding the scanning scope. Hoyen has been rigorously evaluated on 12
popular open-source web applications and compared with 6 representative tools.
The results demonstrate that Hoyen performs a comprehensive exploration of web
applications, expanding the attack surface while achieving about 2x than the
coverage of other scanners on average, with high request accuracy. Furthermore,
Hoyen detected over 90% of its requests towards the core functionality of the
application, detecting more vulnerabilities than other scanners, including
unique vulnerabilities in well-known web applications. Our data/code is
available at https://hoyen.tjunsl.com/