Towards Robustness against Unsuspicious Adversarial Examples

TOP 文献データベース Towards Robustness against Unsuspicious Adversarial Examples

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2005.04272

PDF

https://arxiv.org/pdf/2005.04272

文献情報

作者: Liang Tong,Minzhe Guo,Atul Prakash,Yevgeniy Vorobeychik
公開日: 2020-5-9
更新日: 2020-10-9
所属機関: Washington University in St. Louis
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

敵対的サンプル敵対的訓練ロバスト性向上手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Despite the remarkable success of deep neural networks, significant concerns have emerged about their robustness to adversarial perturbations to inputs. While most attacks aim to ensure that these are imperceptible, physical perturbation attacks typically aim for being unsuspicious, even if perceptible. However, there is no universal notion of what it means for adversarial examples to be unsuspicious. We propose an approach for modeling suspiciousness by leveraging cognitive salience. Specifically, we split an image into foreground (salient region) and background (the rest), and allow significantly larger adversarial perturbations in the background, while ensuring that cognitive salience of background remains low. We describe how to compute the resulting non-salience-preserving dual-perturbation attacks on classifiers. We then experimentally demonstrate that our attacks indeed do not significantly change perceptual salience of the background, but are highly effective against classifiers robust to conventional attacks. Furthermore, we show that adversarial training with dual-perturbation attacks yields classifiers that are more robust to these than state-of-the-art robust learning approaches, and comparable in terms of robustness to conventional attacks.