Phish-Blitz: Advancing Phishing Detection with Comprehensive Webpage Resource Collection and Visual Integrity Preservation

TOP Literature Database Phish-Blitz: Advancing Phishing Detection with Comprehensive Webpage Resource Collection and Visual Integrity Preservation

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2509.08375

PDF

https://arxiv.org/pdf/2509.08375

Paper Information

Author: Duddu Hriday,Aditya Kulkarni,Vivek Balachandran,Tamal Das
Published: 9-10-2025
Affiliation: Indian Institute of Technology, Dharwad
Country: India
Conference: International Conference on Communication Systems and Networks (COMSNETS)

Labels Estimated by AI

フィッシング攻撃の傾向(Fail to translate) Website Vulnerability Visual Similarity Detection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Phishing attacks are increasingly prevalent, with adversaries creating deceptive webpages to steal sensitive information. Despite advancements in machine learning and deep learning for phishing detection, attackers constantly develop new tactics to bypass detection models. As a result, phishing webpages continue to reach users, particularly those unable to recognize phishing indicators. To improve detection accuracy, models must be trained on large datasets containing both phishing and legitimate webpages, including URLs, webpage content, screenshots, and logos. However, existing tools struggle to collect the required resources, especially given the short lifespan of phishing webpages, limiting dataset comprehensiveness. In response, we introduce Phish-Blitz, a tool that downloads phishing and legitimate webpages along with their associated resources, such as screenshots. Unlike existing tools, Phish-Blitz captures live webpage screenshots and updates resource file paths to maintain the original visual integrity of the webpage. We provide a dataset containing 8,809 legitimate and 5,000 phishing webpages, including all associated resources. Our dataset and tool are publicly available on GitHub, contributing to the research community by offering a more complete dataset for phishing detection.

External Datasets

5000 phishing webpages

8809 legitimate webpages