These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The adoption of artificial intelligence (AI) across industries has led to the
widespread use of complex black-box models and interpretation tools for
decision making. This paper proposes an adversarial framework to uncover the
vulnerability of permutation-based interpretation methods for machine learning
tasks, with a particular focus on partial dependence (PD) plots. This
adversarial framework modifies the original black box model to manipulate its
predictions for instances in the extrapolation domain. As a result, it produces
deceptive PD plots that can conceal discriminatory behaviors while preserving
most of the original model's predictions. This framework can produce multiple
fooled PD plots via a single model. By using real-world datasets including an
auto insurance claims dataset and COMPAS (Correctional Offender Management
Profiling for Alternative Sanctions) dataset, our results show that it is
possible to intentionally hide the discriminatory behavior of a predictor and
make the black-box model appear neutral through interpretation tools like PD
plots while retaining almost all the predictions of the original black-box
model. Managerial insights for regulators and practitioners are provided based
on the findings.
References
IEEE Transactions on Visualization and Computer Graphics
A visual analytics conceptual framework for explorable and steerable partial dependence analysis
M. Angelini, G. Blasilli, S. Lenti, G. Santucci
Published: 2023
Machine bias
J. Angwin, J. Larson, S. Mattu, L. Kirchner
Published: 2016
Journal of the Royal Statistical Society Series B: Statistical Methodology
Visualizing the effects of predictor variables in black box supervised learning models
D. W. Apley, J. Zhu
Published: 2020
Proceedings of the aaai conference on artificial intelligence
Manipulating shap via adversarial data perturbations (student abstract)