These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Email phishing remains a prevalent cyber threat, targeting victims to extract
sensitive information or deploy malicious software. This paper explores the
integration of open-source intelligence (OSINT) tools and machine learning (ML)
models to enhance phishing detection across multilingual datasets. Using Nmap
and theHarvester, this study extracted 17 features, including domain names, IP
addresses, and open ports, to improve detection accuracy. Multilingual email
datasets, including English and Arabic, were analyzed to address the
limitations of ML models trained predominantly on English data. Experiments
with five classification algorithms: Decision Tree, Random Forest, Support
Vector Machine, XGBoost, and Multinomial Na\"ive Bayes. It revealed that Random
Forest achieved the highest performance, with an accuracy of 97.37% for both
English and Arabic datasets. For OSINT-enhanced datasets, the model
demonstrated an improvement in accuracy compared to baseline models without
OSINT features. These findings highlight the potential of combining OSINT tools
with advanced ML models to detect phishing emails more effectively across
diverse languages and contexts. This study contributes an approach to phishing
detection by incorporating OSINT features and evaluating their impact on
multilingual datasets, addressing a critical gap in cybersecurity research.