Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email Protection

TOP Literature Database Different Victims, Same Layout: Email Visual Similarity Detection for Enhanced Email Protection

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2408.16945

PDF

https://arxiv.org/pdf/2408.16945

Paper Information

Author: Sachin Shukla;Omid Mirzaei
Published: 8-30-2024
Updated: 9-4-2024
Affiliation: Cisco Talos, CA, USA
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

In the pursuit of an effective spam detection system, the focus has often been on identifying known spam patterns either through rule-based detection systems or machine learning (ML) solutions that rely on keywords. However, both systems are susceptible to evasion techniques and zero-day attacks that can be achieved at low cost. Therefore, an email that bypassed the defense system once can do it again in the following days, even though rules are updated or the ML models are retrained. The recurrence of failures to detect emails that exhibit layout similarities to previously undetected spam is concerning for customers and can erode their trust in a company. Our observations show that threat actors reuse email kits extensively and can bypass detection with little effort, for example, by making changes to the content of emails. In this work, we propose an email visual similarity detection approach, named Pisco, to improve the detection capabilities of an email threat defense system. We apply our proof of concept to some real-world samples received from different sources. Our results show that email kits are being reused extensively and visually similar emails are sent to our customers at various time intervals. Therefore, this method could be very helpful in situations where detection engines that rely on textual features and keywords are bypassed, an occurrence our observations show happens frequently.

External Datasets

emails received from different sources in our corpus