User-Centric Phishing Detection: A RAG and LLM-Based Approach

Authors: Abrar Hamed Al Barwani, Abdelaziz Amara Korba, Raja Waseem Anwar | Published: 2026-01-29

2026.01.292026.01.31

Authors: Abrar Hamed Al Barwani, Abdelaziz Amara Korba, Raja Waseem Anwar
Published: 2026-01-29

Source: https://arxiv.org/abs/2601.21261

PDF: https://arxiv.org/pdf/2601.21261

Labels Predicted by AI

Poisoning attack on RAG LLM Performance Evaluation

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

The escalating sophistication of phishing emails necessitates a shift beyond traditional rule-based and conventional machine-learning-based detectors. Although large language models (LLMs) offer strong natural language understanding, using them as standalone classifiers often yields elevated falsepositive (FP) rates, which mislabel legitimate emails as phishing and create significant operational burden. This paper presents a personalized phishing detection framework that integrates LLMs with retrieval-augmented generation (RAG). For each message, the system constructs user-specific context by retrieving a compact set of the user’s historical legitimate emails and enriching it with real-time domain and URL reputation from a cyber-threat intelligence platform, then conditions the LLM’s decision on this evidence. We evaluate four open-source LLMs (Llama4-Scout, DeepSeek-R1, Mistral-Saba, and Gemma2) on an email dataset collected from public and institutional sources. Results show high performance; for example, Llama4-Scout attains an F1-score of 0.9703 and achieves a 66.7