Where Classification Fails, Interpretation Rises

TOP 文献データベース Where Classification Fails, Interpretation Rises

Computing Research Repository (CoRR)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1712.00558

PDF

https://arxiv.org/pdf/1712.00558

文献情報

作者: Chanh Nguyen,Georgi Georgiev,Yujie Ji,Ting Wang
公開日: 2017-12-2
所属機関: Department of Computer Science and Engineering, Lehigh University
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

敵対的学習 FDI攻撃検出手法モデルの頑健性保証

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their application in security-critical domains. Most existing detection methods attempt to use carefully engineered patterns to distinguish adversarial inputs from their genuine counterparts, which however can often be circumvented by adaptive adversaries. In this work, we take a completely different route by leveraging the definition of adversarial inputs: while deceiving for deep neural networks, they are barely discernible for human visions. Building upon recent advances in interpretable models, we construct a new detection framework that contrasts an input's interpretation against its classification. We validate the efficacy of this framework through extensive experiments using benchmark datasets and attacks. We believe that this work opens a new direction for designing adversarial input detection methods.