These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Many methods have been developed to understand complex predictive models and
high expectations are placed on post-hoc model explainability. It turns out
that such explanations are not robust nor trustworthy, and they can be fooled.
This paper presents techniques for attacking Partial Dependence (plots,
profiles, PDP), which are among the most popular methods of explaining any
predictive model trained on tabular data. We showcase that PD can be
manipulated in an adversarial manner, which is alarming, especially in
financial or medical applications where auditability became a must-have trait
supporting black-box machine learning. The fooling is performed via poisoning
the data to bend and shift explanations in the desired direction using genetic
and gradient algorithms. We believe this to be the first work using a genetic
algorithm for manipulating explanations, which is transferable as it
generalizes both ways: in a model-agnostic and an explanation-agnostic manner.