PhishLang: A Lightweight, Client-Side Phishing Detection Framework using MobileBERT for Real-Time, Explainable Threat Mitigation

Jan Kocoń, Igor Cichecki, Oliwier Kaszyca, Mateusz Kochanek, Dominika Szydło, Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz

Published: 2023

CoRR

Detecting phishing sites using chatgpt

T. Koide, N. Fukushi, H. Nakano, D. Chiba

Published: 2023

2012 7th international conference on risks and security of internet and systems (CRiSIS)

Don’t work. Can’t work? Why it’s time to rethink security warnings

Kat Krol, Matthew Moroz, M Angela Sasse

Published: 2012

arxiv

被引用数 4

URLNet: Learning a URL Representation with Deep Learning for Malicious URL Detection

Hung Le, Quang Pham, Doyen Sahoo, Steven C. H. Hoi

Published: 2018.2.9

Malicious URLs host unsolicited content and are used to perpetrate cybercrimes. It is imperative to detect them in a timely manner. Traditionally, this is done through the usage of blacklists, which cannot be exhaustive, and cannot detect newly generated malicious URLs. To address this, recent years have witnessed several efforts to perform Malicious URL Detection using Machine Learning. The most popular and scalable approaches use lexical properties of the URL string by extracting Bag-of-words like features, followed by applying machine learning models such as SVMs. There are also other features designed by experts to improve the prediction performance of the model. These approaches suffer from several limitations: (i) Inability to effectively capture semantic meaning and sequential patterns in URL strings; (ii) Requiring substantial manual feature engineering; and (iii) Inability to handle unseen features and generalize to test data. To address these challenges, we propose URLNet, an end-to-end deep learning framework to learn a nonlinear URL embedding for Malicious URL Detection directly from the URL. Specifically, we apply Convolutional Neural Networks to both characters and words of the URL String to learn the URL embedding in a jointly optimized framework. This approach allows the model to capture several types of semantic information, which was not possible by the existing models. We also propose advanced word-embeddings to solve the problem of too many rare words observed in this task. We conduct extensive experiments on a large-scale dataset and show a significant performance gain over existing methods. We also conduct ablation studies to evaluate the performance of various components of URLNet.

機械学習手法メンバーシップ推論モデルインバージョン

NDSS MADWeb

Building robust phishing detection system: an empirical analysis

J. Lee, P. Ye, R. Liu, D. M. Divakaran, M. C. Chan

Published: 2020

Future Generation Computer Systems

A stacking model using URL and HTML features for phishing webpage detection

Y. Li, Z. Yang, X. Chen, H. Yuan, W. Liu

Published: 2019

Proceedings of the 25th International Conference on World Wide Web

Cracking classifiers for evasion: a case study on the google’s phishing pages filter

Bin Liang, Miaoqiang Su, Wei You, Wenchang Shi, Gang Yang

Published: 2016

30th USENIX Security Symposium (USENIX Security 21)

Phishpedia: a hybrid deep learning based approach to visually identify phishing webpages

Lin, Y., Liu, R., Divakaran, D.M., Ng, J.Y., Chan, Q.Z., Lu, Y., Si, Y., Zhang, F., Dong, J.S.

Published: 2021

31st USENIX Security Symposium (USENIX Security 22)

Inferring phishing intention via webpage appearance and dynamics: A deep vision based approach

Ruofan Liu, Yun Lin, Xianglin Yang, Siang Hwee Ng, Dinil Mon Divakaran, Jin Song Dong

Published: 2022

Italian Journal of Marketing

Hands off my data: Users’ security concerns and intention to adopt privacy enhancing technologies

Federico Mangiò, Daniela Andreini, Giuseppe Pedeliento

Published: 2020

School of Computing and Engineering, University of Huddersfield

Phishing websites features

Rami M Mohammad, Fadi Thabtah, Lee McCluskey

Published: 2015

arxiv

被引用数 1

AISec@CCS

Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors

Biagio Montaruli, Luca Demetrio, Maura Pintor, Luca Compagna, Davide Balzarotti, Battista Biggio

Published: 2023.10.5

Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.

ポイズニングフィッシング機械学習手法

Information Systems Journal

A comparison of features in a crowdsourced phishing warning system

Christopher Nguyen, Matthew L Jensen, Alexandra Durcikova, Ryan T Wright

Published: 2021

2013 International Conference on Advanced Technologies for Communications

Detecting phishing web sites: A heuristic URL-based approach

Luong Anh Tuan Nguyen, Ba Lam To, Huu Khuong Nguyen, Minh Hoang Nguyen

Published: 2013

29th USENIX Security Symposium

Phishtime: Continuous longitudinal measurement of the effectiveness of anti-phishing blacklists

Adam Oest, Yeganeh Safaei, Penghui Zhang, Brad Wardman, Kevin Tyers, Yan Shoshitaishvili, Adam Doupé

Published: 2020

2018 APWG Symposium on Electronic Crime Research (eCrime)

Inside a phisher’s mind: Understanding the anti-phishing ecosystem through phishing kit analysis

Adam Oest, Yeganeh Safei, Adam Doupé, Gail-Joon Ahn, Brad Wardman, Gary Warner

Published: 2018

29th {USENIX} Security Symposium ({USENIX} Security 20)

Sunrise to sunset: Analyzing the end-to-end life cycle and effectiveness of phishing attacks at scale

Adam Oest, Penghui Zhang, Brad Wardman, Eric Nunes, Jakub Burgis, Ali Zand, Kurt Thomas, Adam Doupé, Gail-Joon Ahn

Published: 2020

OpenPhish Monthly Brands List

OpenPhish

Published: 2022

CSET

Towards adversarial phishing detection

Thomas Kobber Panum, Kaspar Hageman, René Rydhof Hansen, Jens Myrup Pedersen

Published: 2020

Proceedings of the Internet Measurement Conference

Opening the blackbox of virustotal: Analyzing online phishing scan engines

Peng Peng, Limin Yang, Linhai Song, Gang Wang

Published: 2019

S&P

Intriguing properties of adversarial ML attacks in the problem space

Fabio Pierazzi, Feargus Pendlebury, Jacopo Cortellazzi, Lorenzo Cavallaro

Published: 2020

Proceedings of the 19th ACM Asia Conference on Computer and Communications Security

Deep Dive into Client-Side Anti-Phishing: A Longitudinal Study Bridging Academia and Industry

Rana Pourmohamad, Steven Wirsz, Adam Oest, Tiffany Bao, Yan Shoshitaishvili, Ruoyu Wang, Adam Doupé, Rida A Bazzi

Published: 2024

OpenAI blog

Language models are unsupervised multitask learners

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever

Published: 2019

Big Data and Cognitive Computing

MalBERTv2: Code aware BERT-based model for malware identification

Abir Rahali, Moulay A Akhloufi

Published: 2023

Proceedings of the seventh symposium on usable privacy and security

A brick wall, a locked door, and a bandit: a physical security metaphor for firewall warnings

Fahimeh Raja, Kirstie Hawkey, Steven Hsu, Kai-Le Clement Wang, Konstantin Beznosov

Published: 2011

2021 APWG Symposium on Electronic Crime Research (eCrime)

Evaluating the effectiveness of phishing reports on twitter

Sayak Saha Roy, Unique Karanjit, Shirin Nilizadeh

Published: 2021

Proceedings of the 2023 ACM on Internet Measurement Conference

Phishing in the Free Waters:A Study of Phishing Attacks Created using Free Website Building Services

Sayak Saha Roy, Unique Karanjit, Shirin Nilizadeh

Published: 2023

Expert Systems with Applications

Machine learning based phishing detection from URLs

Ozgur Koray Sahingoz, Ebubekir Buber, Onder Demir, Banu Diri

Published: 2019

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf

Published: 2019

It5: Large-scale text-to-text pre-training for italian language understanding and generation

Gabriele Sarti, Malvina Nissim

Published: 2022

Data and Applications Security and Privacy XXXIII: 33rd Annual IFIP WG 11.3 Conference, DBSec 2019

Adversarial sampling attacks against phishing detection

Hossein Shirazi, Bruhadeshwar Bezawada, Indrakshi Ray, Charles Anderson

Published: 2019

The Promise and Perils of Google’s Bard for Scientific Research

SM Siad

Published: 2023

Black Hat

I’m not a human: Breaking the Google reCAPTCHA

Suphannee Sivakorn, Jason Polakis, Angelos D Keromytis

Published: 2016

US Patent

Systems and methods for risk rating and pro-actively detecting malicious online ads

Jayesh Sreedharan, Rahul Mohandas

Published: 2016

Browser Market Share Worldwide

Sensors

BERT-Based Approaches to Identifying Malicious URLs

Ming-Yang Su, Kuan-Lin Su

Published: 2023

Proceedings of the 22nd ACM Internet Measurement Conference

PhishInPatterns: measuring elicited user interactions at scale on phishing websites

Karthika Subramani, William Melicher, Oleksii Starov, Phani Vadrevu, Roberto Perdisci

Published: 2022

Mobilebert: a compact task-agnostic bert for resource-limited devices

Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou

Published: 2020

SecNHack

iFrame Injection Attacks and Mitigation

SecNHack Team

Published: 2022

Journal of Information Privacy and Security

Recent survey of various defense mechanisms against phishing attacks

Aakanksha Tewari, AK Jain, BB Gupta

Published: 2016

Llama-2-7B-Chat-GGUF

TheBloke

How this scammer used phishing emails to steal over $100 million from Google and Facebook

CNBC Tom Huddleston Jr.

arxiv

被引用数 1

USENIX Security Symposium

Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features

Liang Tong, Bo Li, Chen Hajaj, Chaowei Xiao, Ning Zhang, Yevgeniy Vorobeychik

Published: 2017.8.28

Machine learning (ML) techniques are increasingly common in security applications, such as malware and intrusion detection. However, ML models are often susceptible to evasion attacks, in which an adversary makes changes to the input (such as malware) in order to avoid being detected. A conventional approach to evaluate ML robustness to such attacks, as well as to design robust ML, is by considering simplified feature-space models of attacks, where the attacker changes ML features directly to effect evasion, while minimizing or constraining the magnitude of this change. We investigate the effectiveness of this approach to designing robust ML in the face of attacks that can be realized in actual malware (realizable attacks). We demonstrate that in the context of structure-based PDF malware detection, such techniques appear to have limited effectiveness, but they are effective with content-based detectors. In either case, we show that augmenting the feature space models with conserved features (those that cannot be unilaterally modified without compromising malicious functionality) significantly improves performance. Finally, we show that feature space models enable generalized robustness when faced with a variety of realizable attacks, as compared to classifiers which are tuned to be robust to a specific realizable attack.

ロバスト性分析敵対的学習モデル抽出攻撃

Cybersecurity Ventures

Beware of Lookalike Domains in Punycode Phishing Attacks

Cybersecurity Ventures

Published: 2019

Financial Cryptography and Data Security: FC 2013 Workshops, USEC and WAHC 2013

QRishing: The susceptibility of smartphone users to QR code phishing attacks

Timothy Vidas, Emmanuel Owusu, Shuai Wang, Cheng Zeng, Lorrie Faith Cranor, Nicolas Christin

Published: 2013

ACM Computing Surveys (CSUR)

Systematic literature review on usability of firewall configuration

Artem Voronkov, Leonardo Horn Iwaya, Leonardo A Martucci, Stefan Lindskog

Published: 2017

2021 9th International Symposium on Digital Forensics and Security (ISDFS)

Malware detection and classification using fastText and BERT

Salih Yesir, İbrahim Soğukpinar

Published: 2021

arxiv

被引用数 1

Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning

Ying Yuan, Giovanni Apruzzese, Mauro Conti

Published: 2022.10.25

Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual feasibility of the attack or the defense. Moreover, adversarial samples are often crafted in the "feature-space", making the corresponding evaluations of questionable value. Simply put, the current situation does not allow to estimate the actual threat posed by adversarial attacks, leading to a lack of secure ML systems. We aim to clarify such confusion in this paper. By considering the application of ML for Phishing Website Detection (PWD), we formalize the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD -- demonstrating that even perturbations in the "feature-space" are useful. Then, we propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers. After that, we perform the first statistically validated assessment of state-of-the-art ML-PWD against 12 evasion attacks. Our evaluation shows (i) the true efficacy of evasion attempts that are more likely to occur; and (ii) the impact of perturbations crafted in different evasion-spaces. Our realistic evasion attempts induce a statistically significant degradation (3-10% at p<0.05), and their cheap cost makes them a subtle threat. Notably, however, some ML-PWD are immune to our most realistic attacks (p=0.22). Finally, as an additional contribution of this journal publication, we are the first to consider the intriguing case wherein an attacker introduces perturbations in multiple evasion-spaces at the same time. These new results show that simultaneously applying perturbations in the problem- and feature-space can cause a drop in the detection rate from 0.95 to 0.

悪意のあるウェブサイト検出ポイズニング攻撃シナリオ分析

arxiv

被引用数 1

Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning

Ying Yuan, Giovanni Apruzzese, Mauro Conti

Published: 2022.10.25

Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual feasibility of the attack or the defense. Moreover, adversarial samples are often crafted in the "feature-space", making the corresponding evaluations of questionable value. Simply put, the current situation does not allow to estimate the actual threat posed by adversarial attacks, leading to a lack of secure ML systems. We aim to clarify such confusion in this paper. By considering the application of ML for Phishing Website Detection (PWD), we formalize the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD -- demonstrating that even perturbations in the "feature-space" are useful. Then, we propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers. After that, we perform the first statistically validated assessment of state-of-the-art ML-PWD against 12 evasion attacks. Our evaluation shows (i) the true efficacy of evasion attempts that are more likely to occur; and (ii) the impact of perturbations crafted in different evasion-spaces. Our realistic evasion attempts induce a statistically significant degradation (3-10% at p<0.05), and their cheap cost makes them a subtle threat. Notably, however, some ML-PWD are immune to our most realistic attacks (p=0.22). Finally, as an additional contribution of this journal publication, we are the first to consider the intriguing case wherein an attacker introduces perturbations in multiple evasion-spaces at the same time. These new results show that simultaneously applying perturbations in the problem- and feature-space can cause a drop in the detection rate from 0.95 to 0.

悪意のあるウェブサイト検出ポイズニング攻撃シナリオ分析

Proceedings of the ACM on Web Conference 2024

Understanding the Users’ Perception of Adversarial Webpages

Ying Yuan, Qingying Hao, Giovanni Apruzzese, Mauro Conti, Gang Wang

Published: 2024

Proceedings of the AAAI Conference on Artificial Intelligence

Semantics-aware BERT for language understanding

Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, Xiang Zhou

Published: 2020