Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

TOP 文献データベース Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2007.14042

PDF

https://arxiv.org/pdf/2007.14042

文献情報

作者: Jirong Yi,Raghu Mudumbai,Weiyu Xu
公開日: 2020-7-28
所属機関: Department of Electrical and Computer Engineering, University of Iowa
所属の国: United States of America
会議名: IEEECONF

AIにより推定されたラベル

攻撃手法敵対的サンプル敵対的摂動手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic perspective, we seek to identify conditions under which adversarial vulnerability is unavoidable i.e. even optimally designed classifiers will be vulnerable to small adversarial perturbations. We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest, i.e., finding the optimal perturbation distributions to minimize the mutual information between the degraded signal and a signal following a continuous or discrete distribution. In addition, we show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available. This provides additional support to the recently proposed ``feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers. We also report on results from computational experiments to illustrate our theoretical results.