AIセキュリティポータル K Program
Multimodal Large Language Models for Phishing Webpage Detection and Identification
Share
Abstract
To address the challenging problem of detecting phishing webpages, researchers have developed numerous solutions, in particular those based on machine learning (ML) algorithms. Among these, brand-based phishing detection that uses models from Computer Vision to detect if a given webpage is imitating a well-known brand has received widespread attention. However, such models are costly and difficult to maintain, as they need to be retrained with labeled dataset that has to be regularly and continuously collected. Besides, they also need to maintain a good reference list of well-known websites and related meta-data for effective performance. In this work, we take steps to study the efficacy of large language models (LLMs), in particular the multimodal LLMs, in detecting phishing webpages. Given that the LLMs are pretrained on a large corpus of data, we aim to make use of their understanding of different aspects of a webpage (logo, theme, favicon, etc.) to identify the brand of a given webpage and compare the identified brand with the domain name in the URL to detect a phishing attack. We propose a two-phase system employing LLMs in both phases: the first phase focuses on brand identification, while the second verifies the domain. We carry out comprehensive evaluations on a newly collected dataset. Our experiments show that the LLM-based system achieves a high detection rate at high precision; importantly, it also provides interpretable evidence for the decisions. Our system also performs significantly better than a state-of-the-art brand-based phishing detection system while demonstrating robustness against two known adversarial attacks.
Large-Scale Automatic Classification of Phishing Pages
C. Whittaker, B. Ryner, M. Nazif
Published: 2010
CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites
G. Xiang, J. Hong, C. P. Rose, L. Cranor
Published: 2011
A stacking model using URL and HTML features for phishing webpage detection
Y. Li, Z. Yang, X. Chen, H. Yuan, W. Liu
Published: 2019
Building robust phishing detection system: an empirical analysis
J. Lee, P. Ye, R. Liu, D. M. Divakaran, M. C. Chan
Published: 2020
Phishing vs. legit: Comparative analysis of client-side resources of phishing and target brand websites
K. Lim, J. Park, D. Kim
Published: 2024
Visualphishnet: Zero-day phishing website detection by visual similarity
Sahar Abdelnabi, Katharina Krombholz, Mario Fritz
Published: 2020
Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages
Y. Lin, R. Liu, D. M. Divakaran, J. Y. Ng, Q. Z. Chan, Y. Lu, Y. Si, F. Zhang, J. S. Dong
Published: 2021
Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach
R. Liu, Y. Lin, X. Yang, S. H. Ng, D. M. Divakaran, J. S. Dong
Published: 2022
LogoMotive: detecting logos on websites to identify online scams - a TLD case study
T. v. d. Hout, T. Wabeke, G. C. M. Moura, C. Hesselman
Published: 2022
KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection
Yuexin Li, Chengyu Huang, Shumin Deng, Mei Lin Lock, Tri Cao, Nay Oo, Hoon Wei Lim, Bryan Hooi
Published: 2024.3.5
Knowledge Expansion and Counterfactual Interaction for Reference-Based Phishing Detection
R. Liu, Y. Lin, Y. Zhang, P. H. Lee, J. S. Dong
Published: 2023
Attention Is All You Need
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin
Published: 2017
Attacking logo-based phishing website detectors with adversarial perturbations
J. Lee, Z. Xin, M. N. P. See, K. Sabharwal, G. Apruzzese, D. M. Divakaran
Published: 2023
From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks
Aditya Kulkarni, Vivek Balachandran, Dinil Mon Divakaran, Tamal Das
Published: 2024.7.30
Multimodal Large Language Models for Phishing Webpage Detection and Identification
Jehyun Lee, Peiyuan Lim, Bryan Hooi, Dinil Mon Divakaran
Published: 2024.8.12
A layout-similarity-based approach for detecting phishing pages
A. P. Rosiello, E. Kirda, F. Ferrandi
Published: 2007
Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning
Ying Yuan, Giovanni Apruzzese, Mauro Conti
Published: 2022.10.25
A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
Kang Leng Chiew, Choon Lin Tan, KokSheik Wong, Kelvin S.C. Yong, Wei King Tiong
Published: 2019
Identi: Identifying and detecting online phishing attacks using deep learning and nlp techniques
K. T. Saleh, Z. A.-N. Al-Makhadmeh, M. Qaddoura
Published: 2021
Phishing web page detection based on visual similarity features using cnn
S. Liu, F. Chen, S. Jiang, K. Lu, Y. Liu, Q. Li
Published: 2017
LogoSENSE: A Companion HOG based Logo Detection Scheme for Phishing Web Page and E-mail Brand Recognition
A. S. Bozkir, M. Aydos
Published: 2020
Phishaod: An automated detection framework for phishing urls based on deep learning
M. Narwaria, S. Roy, S. Das
Published: 2020
Hybrid phishing detection using joint visual and textual identity
C. C. L. Tan, K. L. Chiew, K. S. Yong, Y. Sebastian, J. C. M. Than, W. K. Tiong
Published: 2023
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content
Xinlei He, Savvas Zannettou, Yun Shen, Yang Zhang
Published: 2023.8.10
D-Fence: A Flexible, Efficient, and Comprehensive Phishing Email Detection System
J. Lee, F. Tang, P. Ye, F. Abbasi, P. Hay, D. M. Divakaran
Published: 2021
ChatSpamDetector: Leveraging Large Language Models for Effective Phishing Email Detection
Takashi Koide, Naoki Fukushi, Hiroki Nakano, Daiki Chiba
Published: 2024.2.28
Tranco: A research-oriented top sites ranking hardened against manipulation
V. L. Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Korczynski, W. Joosen
Published: 2019
CrawlPhish: Large-scale analysis of client-side cloaking techniques in phishing
P. Zhang, A. Oest, H. Cho, Z. Sun, R. Johnson, B. Wardman, S. Sarker, A. Kapravelos, T. Bao, R. Wang, Y. Shoshitaishvili, A. Doupe, G.-J. Ahn
Published: 2021
I’m spartacus, no, I’m spartacus: Proactively protecting users from phishing by intentionally triggering cloaking behavior
P. Zhang, Z. Sun, S. Kyung, H. W. Behrens, Z. L. Basque, H. Cho, A. Oest, R. Wang, T. Bao, Y. Shoshitaishvili, G.-J. Ahn, A. Doupe
Published: 2022
CFrame: Characterizing and measuring in-the-wild CAPTCHA attacks
H. Dai Nguyen, K. Subramani, B. Acharya, R. Perdisci, P. Vadrevu
Published: 2024
"Are Adversarial Phishing Webpages a Threat in Reality?" Understanding the Users' Perception of Adversarial Webpages
Ying Yuan, Qingying Hao, Giovanni Apruzzese, Mauro Conti, Gang Wang
Published: 2024.4.4
Generative adversarial perturbations
O. Poursaeed, I. Katsman, B. Gao, S. Belongie
Published: 2018
Share