XRand: Differentially Private Defense against Explanation-Guided Attacks

TOP Literature Database XRand: Differentially Private Defense against Explanation-Guided Attacks

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2212.04454

PDF

https://arxiv.org/pdf/2212.04454

Paper Information

Author: Truc Nguyen,Phung Lai,NhatHai Phan,My T. Thai
Published: 12-9-2022
Updated: 12-15-2022
Affiliation: University of Florida, Gainesville, FL 32611
Country: United States of America
Conference

Labels Estimated by AI

Method for Providing Explainability while Keeping Model Information Confidential Privacy Assessment Differential Privacy

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Recent development in the field of explainable artificial intelligence (XAI) has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in which an explanation is provided together with the model prediction in response to each query. However, XAI also opens a door for adversaries to gain insights into the black-box models in MLaaS, thereby making the models more vulnerable to several attacks. For example, feature-based explanations (e.g., SHAP) could expose the top important features that a black-box model focuses on. Such disclosure has been exploited to craft effective backdoor triggers against malware classifiers. To address this trade-off, we introduce a new concept of achieving local differential privacy (LDP) in the explanations, and from that we establish a defense, called XRand, against such attacks. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.

External Datasets

EMBER

Contagio PDF

Drebin