CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails

TOP 文献データベース CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2010.03484

PDF

https://arxiv.org/pdf/2010.03484

文献情報

作者: Younghoo Lee,Joshua Saxe,Richard Harang
公開日: 2020-10-8
所属機関: Sophos AI
所属の国: United Kingdom
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

モデルアーキテクチャ学習の改善機械学習

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Targeted phishing emails are on the rise and facilitate the theft of billions of dollars from organizations a year. While malicious signals from attached files or malicious URLs in emails can be detected by conventional malware signatures or machine learning technologies, it is challenging to identify hand-crafted social engineering emails which don't contain any malicious code and don't share word choices with known attacks. To tackle this problem, we fine-tune a pre-trained BERT model by replacing the half of Transformer blocks with simple adapters to efficiently learn sophisticated representations of the syntax and semantics of the natural language. Our Context-Aware network also learns the context representations between email's content and context features from email headers. Our CatBERT(Context-Aware Tiny Bert) achieves a 87% detection rate as compared to DistilBERT, LSTM, and logistic regression baselines which achieve 83%, 79%, and 54% detection rates at false positive rates of 1%, respectively. Our model is also faster than competing transformer approaches and is resilient to adversarial attacks which deliberately replace keywords with typos or synonyms.

外部データセット

large-scale dataset

labelled target dataset

dataset of about five million emails

training dataset

validation dataset

test dataset