Detecting Adversarial Examples - A Lesson from Multimedia Forensics

TOP 文献データベース Detecting Adversarial Examples - A Lesson from Multimedia Forensics

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1803.03613

PDF

https://arxiv.org/pdf/1803.03613

文献情報

作者: Pascal Schöttle,Alexander Schlögl,Cecilia Pasquini,Rainer Böhme
公開日: 2018-3-10
所属機関: Department of Computer Science, University of Innsbruck
所属の国: Austria
会議名

AIにより推定されたラベル

敵対的攻撃手法敵対的サンプル敵対的サンプルの検知

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Adversarial classification is the task of performing robust classification in the presence of a strategic attacker. Originating from information hiding and multimedia forensics, adversarial classification recently received a lot of attention in a broader security context. In the domain of machine learning-based image classification, adversarial classification can be interpreted as detecting so-called adversarial examples, which are slightly altered versions of benign images. They are specifically crafted to be misclassified with a very high probability by the classifier under attack. Neural networks, which dominate among modern image classifiers, have been shown to be especially vulnerable to these adversarial examples. However, detecting subtle changes in digital images has always been the goal of multimedia forensics and steganalysis. In this paper, we highlight the parallels between these two fields and secure machine learning. Furthermore, we adapt a linear filter, similar to early steganalysis methods, to detect adversarial examples that are generated with the projected gradient descent (PGD) method, the state-of-the-art algorithm for this task. We test our method on the MNIST database and show for several parameter combinations of PGD that our method can reliably detect adversarial examples. Additionally, the combination of adversarial re-training and our detection method effectively reduces the attack surface of attacks against neural networks. Thus, we conclude that adversarial examples for image classification possibly do not withstand detection methods from steganalysis, and future work should explore the effectiveness of known techniques from multimedia forensics in other adversarial settings.

外部データセット

MNIST