A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content

TOP 文献データベース A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1804.05020

PDF

https://arxiv.org/pdf/1804.05020

文献情報

作者: Joshua Saxe,Richard Harang,Cody Wild,Hillary Sanders
公開日: 2018-4-14
所属機関: Sophos
所属の国: United Kingdom
会議名: IEEE Symposium on Security and Privacy Workshops

AIにより推定されたラベル

深層学習バックドアモデルの検知ウェブページコンテンツ分析

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Malicious web content is a serious problem on the Internet today. In this paper we propose a deep learning approach to detecting malevolent web pages. While past work on web content detection has relied on syntactic parsing or on emulation of HTML and Javascript to extract features, our approach operates directly on a language-agnostic stream of tokens extracted directly from static HTML files with a simple regular expression. This makes it fast enough to operate in high-frequency data contexts like firewalls and web proxies, and allows it to avoid the attack surface exposure of complex parsing and emulation code. Unlike well-known approaches such as bag-of-words models, which ignore spatial information, our neural network examines content at hierarchical spatial scales, allowing our model to capture locality and yielding superior accuracy compared to bag-of-words baselines. Our proposed architecture achieves a 97.5% detection rate at a 0.1% false positive rate, and classifies small-batched web pages at a rate of over 100 per second on commodity hardware. The speed and accuracy of our approach makes it appropriate for deployment to endpoints, firewalls, and web proxies.