Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods

TOP 文献データベース Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack Methods

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2006.01304

PDF

https://arxiv.org/pdf/2006.01304

文献情報

作者: Kyungmi Lee,Anantha P. Chandrakasan
公開日: 2020-6-2
所属機関: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

敵対的攻撃検出ポイズニング防御効果分析

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We identify three common cases that lead to overestimation of adversarial accuracy against bounded first-order attack methods, which is popularly used as a proxy for adversarial robustness in empirical studies. For each case, we propose compensation methods that either address sources of inaccurate gradient computation, such as numerical instability near zero and non-differentiability, or reduce the total number of back-propagations for iterative attacks by approximating second-order information. These compensation methods can be combined with existing attack methods for a more precise empirical evaluation metric. We illustrate the impact of these three cases with examples of practical interest, such as benchmarking model capacity and regularization techniques for robustness. Overall, our work shows that overestimated adversarial accuracy that is not indicative of robustness is prevalent even for conventionally trained deep neural networks, and highlights cautions of using empirical evaluation without guaranteed bounds.