Explanations can be manipulated and geometry is to blame

TOP 文献データベース Explanations can be manipulated and geometry is to blame

Conference on Neural Information Processing Systems (NeurIPS)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1906.07983

PDF

https://arxiv.org/pdf/1906.07983

文献情報

作者: Ann-Kathrin Dombrowski,Maximilian Alber,Christopher J. Anders,Marcel Ackermann,Klaus-Robert Müller,Pan Kessel
公開日: 2019-6-19
更新日: 2019-9-26
所属機関: Machine Learning Group, EE & Computer Science Faculty, TU-Berlin
所属の国: Germany
会議名: Conference on Neural Information Processing Systems (NeurIPS)

AIにより推定されたラベル

説明可能性に対する攻撃モデルの解釈性ロバスト性に関する評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.