Targeted phishing emails are on the rise and facilitate the theft of billions
of dollars from organizations a year. While malicious signals from attached
files or malicious URLs in emails can be detected by conventional malware
signatures or machine learning technologies, it is challenging to identify
hand-crafted social engineering emails which don't contain any malicious code
and don't share word choices with known attacks. To tackle this problem, we
fine-tune a pre-trained BERT model by replacing the half of Transformer blocks
with simple adapters to efficiently learn sophisticated representations of the
syntax and semantics of the natural language. Our Context-Aware network also
learns the context representations between email's content and context features
from email headers. Our CatBERT(Context-Aware Tiny Bert) achieves a 87%
detection rate as compared to DistilBERT, LSTM, and logistic regression
baselines which achieve 83%, 79%, and 54% detection rates at false positive
rates of 1%, respectively. Our model is also faster than competing transformer
approaches and is resilient to adversarial attacks which deliberately replace
keywords with typos or synonyms.