These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Phishing websites remain a significant cybersecurity threat, necessitating
accurate and cost-effective detection mechanisms. In this paper, we present
CLASP, a novel system that effectively identifies phishing websites by
leveraging multiple intelligent agents, built using large language models
(LLMs), to analyze different aspects of a web resource. The system processes
URLs or QR codes, employing specialized LLM-based agents that evaluate the URL
structure, webpage screenshot, and HTML content to predict potential phishing
threats. To optimize performance while minimizing operational costs, we
experimented with multiple combination strategies for agent-based analysis,
ultimately designing a strategic combination that ensures the per-website
evaluation expense remains minimal without compromising detection accuracy. We
tested various LLMs, including Gemini 1.5 Flash and GPT-4o mini, to build these
agents and found that Gemini 1.5 Flash achieved the best performance with an F1
score of 83.01% on a newly curated dataset. Also, the system maintained an
average processing time of 2.78 seconds per website and an API cost of around
$3.18 per 1,000 websites. Moreover, CLASP surpasses leading previous solutions,
achieving over 40% higher recall and a 20% improvement in F1 score for phishing
detection on the collected dataset. To support further research, we have made
our dataset publicly available, supporting the development of more advanced
phishing detection systems.