Adversarial Feature Desensitization

TOP 文献データベース Adversarial Feature Desensitization

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2006.04621

PDF

https://arxiv.org/pdf/2006.04621

文献情報

作者: Pouya Bashivan;Reza Bayat;Adam Ibrahim;Kartik Ahuja;Mojtaba Faramarzi;Touraj Laleh;Blake Aaron Richards;Irina Rish
公開日: 2020-6-8
更新日: 2022-1-5
所属機関: McGill University
所属の国: Canada
会議名: Conference on Neural Information Processing Systems (NeurIPS)

AIにより推定されたラベル

敵対的サンプル敵対的サンプルの検知アルゴリズム

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Neural networks are known to be vulnerable to adversarial attacks -- slight but carefully constructed perturbations of the inputs which can drastically impair the network's performance. Many defense methods have been proposed for improving robustness of deep networks by training them on adversarially perturbed inputs. However, these models often remain vulnerable to new types of attacks not seen during training, and even to slightly stronger versions of previously seen attacks. In this work, we propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field. Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs. This is achieved through a game where we learn features that are both predictive and robust (insensitive to adversarial attacks), i.e. cannot be used to discriminate between natural and adversarial data. Empirical results on several benchmarks demonstrate the effectiveness of the proposed approach against a wide range of attack types and attack strengths. Our code is available at https://github.com/BashivanLab/afd.