The privacy issue of counterfactual explanations: explanation linkage attacks

Authors: Sofie Goethals, Kenneth Sörensen, David Martens | Published: 2022-10-21

2022.10.212025.05.28

Authors: Sofie Goethals, Kenneth Sörensen, David Martens
Published: 2022-10-21

Source: https://arxiv.org/abs/2210.12051

PDF: https://arxiv.org/pdf/2210.12051

Labels Predicted by AI

Privacy Violation Counterfactual Explanation Evaluation Method

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Black-box machine learning models are being used in more and more high-stakes domains, which creates a growing need for Explainable AI (XAI). Unfortunately, the use of XAI in machine learning introduces new privacy risks, which currently remain largely unnoticed. We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.