Multilingual Email Phishing Attacks Detection using OSINT and Machine Learning

TOP Literature Database Multilingual Email Phishing Attacks Detection using OSINT and Machine Learning

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2501.08723

PDF

https://arxiv.org/pdf/2501.08723

Paper Information

Author: Panharith An;Rana Shafi;Tionge Mughogho;Onyango Allan Onyango
Published: 1-15-2025
Affiliation
Country
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Phishing Detection Classification Model

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Email phishing remains a prevalent cyber threat, targeting victims to extract sensitive information or deploy malicious software. This paper explores the integration of open-source intelligence (OSINT) tools and machine learning (ML) models to enhance phishing detection across multilingual datasets. Using Nmap and theHarvester, this study extracted 17 features, including domain names, IP addresses, and open ports, to improve detection accuracy. Multilingual email datasets, including English and Arabic, were analyzed to address the limitations of ML models trained predominantly on English data. Experiments with five classification algorithms: Decision Tree, Random Forest, Support Vector Machine, XGBoost, and Multinomial Na\"ive Bayes. It revealed that Random Forest achieved the highest performance, with an accuracy of 97.37% for both English and Arabic datasets. For OSINT-enhanced datasets, the model demonstrated an improvement in accuracy compared to baseline models without OSINT features. These findings highlight the potential of combining OSINT tools with advanced ML models to detect phishing emails more effectively across diverse languages and contexts. This study contributes an approach to phishing detection by incorporating OSINT features and evaluating their impact on multilingual datasets, addressing a critical gap in cybersecurity research.

External Datasets

English phishing emails from Kaggle

English Sample

English OSINT

Arabic sample

Arabic OSINT