XRand: Differentially Private Defense against Explanation-Guided Attacks

Authors: Truc Nguyen, Phung Lai, NhatHai Phan, My T. Thai | Published: 2022-12-08 | Updated: 2022-12-14

2022.12.082025.04.03

Authors: Truc Nguyen, Phung Lai, NhatHai Phan, My T. Thai
Published: 2022-12-08 | Updated: 2022-12-14

Source: https://arxiv.org/abs/2212.04454

PDF: https://arxiv.org/pdf/2212.04454

AIにより推定されたラベル

モデル情報を秘匿しつつ、説明性を提供する手法プライバシー評価差分プライバシー

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Recent development in the field of explainable artificial intelligence (XAI) has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in which an explanation is provided together with the model prediction in response to each query. However, XAI also opens a door for adversaries to gain insights into the black-box models in MLaaS, thereby making the models more vulnerable to several attacks. For example, feature-based explanations (e.g., SHAP) could expose the top important features that a black-box model focuses on. Such disclosure has been exploited to craft effective backdoor triggers against malware classifiers. To address this trade-off, we introduce a new concept of achieving local differential privacy (LDP) in the explanations, and from that we establish a defense, called XRand, against such attacks. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.