These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Targeted phishing emails are on the rise and facilitate the theft of billions
of dollars from organizations a year. While malicious signals from attached
files or malicious URLs in emails can be detected by conventional malware
signatures or machine learning technologies, it is challenging to identify
hand-crafted social engineering emails which don't contain any malicious code
and don't share word choices with known attacks. To tackle this problem, we
fine-tune a pre-trained BERT model by replacing the half of Transformer blocks
with simple adapters to efficiently learn sophisticated representations of the
syntax and semantics of the natural language. Our Context-Aware network also
learns the context representations between email's content and context features
from email headers. Our CatBERT(Context-Aware Tiny Bert) achieves a 87%
detection rate as compared to DistilBERT, LSTM, and logistic regression
baselines which achieve 83%, 79%, and 54% detection rates at false positive
rates of 1%, respectively. Our model is also faster than competing transformer
approaches and is resilient to adversarial attacks which deliberately replace
keywords with typos or synonyms.