Fooling SHAP with Output Shuffling Attacks

2019 international conference on computational intelligence in data science (ICCIDS)

A comparison of regression models for prediction of graduate admissions

M. S. Acharya, A. Armaan, A. S. Antony

Published: 2019

The Eleventh International Conference on Learning Representations

Fooling SHAP with Stealthily Biased Sampling

U. Aïvodji, S. Hara, M. Marchand, F. Khomh

Published: 2022

Information fusion

Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI

A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-Lopez, D. Molina, R. Benjamins

Published: 2020

arxiv

被引用数 3

Computing Research Repository (CoRR)

Adversarial attacks and defenses in explainable artificial intelligence: A survey

Hubert Baniecki, Przemyslaw Biecek

Published: 2023.6.6

Explainable artificial intelligence (XAI) methods are portrayed as a remedy for debugging and trusting statistical and deep learning models, as well as interpreting their predictions. However, recent advances in adversarial machine learning (AdvML) highlight the limitations and vulnerabilities of state-of-the-art explanation methods, putting their security and trustworthiness into question. The possibility of manipulating, fooling or fairwashing evidence of the model's reasoning has detrimental consequences when applied in high-stakes decision-making and knowledge discovery. This survey provides a comprehensive overview of research concerning adversarial attacks on explanations of machine learning models, as well as fairness metrics. We introduce a unified notation and taxonomy of methods facilitating a common ground for researchers and practitioners from the intersecting research fields of AdvML and XAI. We discuss how to defend against attacks and design robust interpretation methods. We contribute a list of existing insecurities in XAI and outline the emerging research directions in adversarial XAI (AdvXAI). Future work should address improving explanation methods and evaluation protocols to take into account the reported safety issues.

攻撃手法敵対的サンプルメンバーシップ推論

AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias

R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilovic, S. Nagar, K. N. Ramamurthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, Y. Zhang

Published: 2018

International Conference on Artificial Intelligence and Statistics

From Shapley values to generalized additive models and back

S. Bordt, U. von Luxburg

Published: 2023

Improving kernelshap: Practical shapley value estimation via linear regression

I. Covert, S.-I. Lee

Published: 2020

Opportunities and challenges in explainable artificial intelligence (xai): A survey

A. Das, P. Rad

Published: 2020

You shouldn’t trust me: Learning models which conceal unfairness from multiple explanation methods

B. Dimanov, U. Bhatt, M. Jamnik, A. Weller

Published: 2020

UCI Machine Learning Repository

Statlog (German Credit Data)

H. Hofmann

Published: 1994

Computer Vision and Machine Intelligence in Medical Image Analysis

Likelihood prediction of diabetes at early stage using data mining techniques

M. Islam, R. Ferdousi, S. Rahman, H. Y. Bushra

Published: 2020

International Conference on Learning Representations (ICLR)

Fool SHAP with Stealthily Biased Sampling

G. Laberge, U. Aïvodji, S. Hara, M. Marchand, F. Khomh

Published: 2023

arxiv

被引用数 1

NIPS

A Unified Approach to Interpreting Model Predictions

Scott Lundberg, Su-In Lee

Published: 2017.5.23

Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

特徴重要度分析 XAI（説明可能なAI）深層学習手法

2023 IEEE Symposium on Security and Privacy (SP)

Disguising attacks with explanation-aware backdoors

M. Noppel, L. Peter, C. Wressnegger

Published: 2023

arxiv

被引用数 1

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin

Published: 2016.2.16

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

説明可能な機械学習 XAI（説明可能なAI）特徴重要度分析

Advances in neural information processing systems

Reliable post hoc explanations: Modeling uncertainty in explainability

D. Slack, A. Hilgard, S. Singh, H. Lakkaraju

Published: 2021

arxiv

被引用数 1

AAAI/ACM Conference on AI, Ethics, and Society (AIES)

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods

Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, Himabindu Lakkaraju

Published: 2019.11.7

As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using extensive evaluation with multiple real-world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases.

説明可能性に対する攻撃敵対的学習 XAI（説明可能なAI）

Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining

A unified approach to quantifying algorithmic unfairness: Measuring individual &group unfairness via inequality indices

T. Speicher, H. Heidari, N. Grgic-Hlaca, K. P. Gummadi, A. Singla, A. Weller, M. B. Zafar

Published: 2018

Proceedings of the 29th International Conference on Scientific and Statistical Database Management

Measuring Fairness in Ranked Outputs

K. Yang, J. Stoyanovich

Published: 2017

The Visual Computer

TRIVEA: transparent ranking interpretation using visual explanation of black-box algorithmic rankers

J. Yuan, K. Bhattacharjee, A. Z. Islam, A. Dasgupta

Published: 2023

Proceedings of the Workshop on Human-In-the-Loop Data Analytics

A Human-in-the-loop Workflow for Multi-Factorial Sensitivity Analysis of Algorithmic Rankers

J. Yuan, A. Dasgupta

Published: 2023