Adversarial Training and Robustness for Multiple Perturbations

TOP 文献データベース Adversarial Training and Robustness for Multiple Perturbations

Conference on Neural Information Processing Systems (NeurIPS)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1904.13000

PDF

https://arxiv.org/pdf/1904.13000

文献情報

作者: Florian Tramèr,Dan Boneh
公開日: 2019-4-30
更新日: 2019-10-18
所属機関: Stanford University
所属の国: United States of America
会議名: Conference on Neural Information Processing Systems (NeurIPS)

AIにより推定されたラベル

敵対的サンプル敵対的攻撃手法ロバスト推定

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small $\ell_\infty$-noise). For other perturbations, these defenses offer no guarantees and, at times, even increase the model's vulnerability. Our aim is to understand the reasons underlying this robustness trade-off, and to train models that are simultaneously robust to multiple perturbation types. We prove that a trade-off in robustness to different types of $\ell_p$-bounded and spatial perturbations must exist in a natural and simple statistical setting. We corroborate our formal analysis by demonstrating similar robustness trade-offs on MNIST and CIFAR10. Building upon new multi-perturbation adversarial training schemes, and a novel efficient attack for finding $\ell_1$-bounded adversarial examples, we show that no model trained against multiple attacks achieves robustness competitive with that of models trained on each attack individually. In particular, we uncover a pernicious gradient-masking phenomenon on MNIST, which causes adversarial training with first-order $\ell_\infty, \ell_1$ and $\ell_2$ adversaries to achieve merely $50\%$ accuracy. Our results question the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.