Input gradients have a pivotal role in a variety of applications, including
adversarial attack algorithms for evaluating model robustness, explainable AI
techniques for generating Saliency Maps, and counterfactual
explanations.However, Saliency Maps generated by traditional neural networks
are often noisy and provide limited insights. In this paper, we demonstrate
that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks,
learned with the dual loss of an optimal transportation problem, exhibit
desirable XAI properties:They are highly concentrated on the essential parts of
the image with low noise, significantly outperforming state-of-the-art
explanation approaches across various models and metrics. We also prove that
these maps align unprecedentedly well with human explanations on ImageNet.To
explain the particularly beneficial properties of the Saliency Map for such
models, we prove this gradient encodes both the direction of the transportation
plan and the direction towards the nearest adversarial attack. Following the
gradient down to the decision boundary is no longer considered an adversarial
attack, but rather a counterfactual explanation that explicitly transports the
input from one class to another. Thus, Learning with such a loss jointly
optimizes the classification objective and the alignment of the gradient, i.e.
the Saliency Map, to the transportation plan direction.These networks were
previously known to be certifiably robust by design, and we demonstrate that
they scale well for large problems and models, and are tailored for
explainability using a fast and straightforward method.
外部データセット
ClickMe
FashionMNIST
CelebA
Cat vs Dog
Imagenet
参考文献
Existence, stability and scalability of orthogonal convolutional neural networks
E. M. Achour, F. Malgouyres, F. Mamalet
Published: 2021
Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18
Sanity checks for saliency maps
J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, B. Kim
Published: 2018
Springer Berlin Heidelberg
Existence and stability results in the L1 theory of optimal transportation
L. Ambrosio, A. Pratelli
Published: 2003
A unified view of gradient-based attribution methods for deep neural networks
M. Ancona, E. Ceolini, A. C. Öztireli, M. H. Gross
Published: 2017
Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research
Sorting out Lipschitz function approximation
C. Anil, J. Lucas, R. Grosse
Published: 2019
The Eleventh International Conference on Learning Representations
A unified algebraic perspective on lipschitz neural networks
A. Araujo, A. J. Havens, B. Delattre, A. Allauzen, B. Hu