These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Phishing attacks are becoming increasingly sophisticated, underscoring the
need for detection systems that strike a balance between high accuracy and
computational efficiency. This paper presents a comparative evaluation of
traditional Machine Learning (ML), Deep Learning (DL), and quantized
small-parameter Large Language Models (LLMs) for phishing detection. Through
experiments on a curated dataset, we show that while LLMs currently
underperform compared to ML and DL methods in terms of raw accuracy, they
exhibit strong potential for identifying subtle, context-based phishing cues.
We also investigate the impact of zero-shot and few-shot prompting strategies,
revealing that LLM-rephrased emails can significantly degrade the performance
of both ML and LLM-based detectors. Our benchmarking highlights that models
like DeepSeek R1 Distill Qwen 14B (Q8_0) achieve competitive accuracy, above
80%, using only 17GB of VRAM, supporting their viability for cost-efficient
deployment. We further assess the models' adversarial robustness and
cost-performance tradeoffs, and demonstrate how lightweight LLMs can provide
concise, interpretable explanations to support real-time decision-making. These
findings position optimized LLMs as promising components in phishing defence
systems and offer a path forward for integrating explainable, efficient AI into
modern cybersecurity frameworks.