Model Interpretability

MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI

Authors: Takayuki Miura, Satoshi Hasegawa, Toshiki Shibahara | Published: 2021-07-19
Membership Inference
Model Interpretability
Attack Method

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano | Published: 2021-07-05 | Updated: 2023-07-07
Model Interpretability
Adversarial Example
Adversarial attack

Generating End-to-End Adversarial Examples for Malware Classifiers Using Explainability

Authors: Ishai Rosenberg, Shai Meir, Jonathan Berrebi, Ilay Gordon, Guillaume Sicard, Eli David | Published: 2020-09-28 | Updated: 2022-06-01
Malware Classification
Model Interpretability
Adversarial Example

Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks

Authors: Tianyu Pang, Kun Xu, Jun Zhu | Published: 2019-09-25 | Updated: 2020-02-20
Model Interpretability
Adversarial Example
Adversarial attack

Evaluating Explanation Without Ground Truth in Interpretable Machine Learning

Authors: Fan Yang, Mengnan Du, Xia Hu | Published: 2019-07-16 | Updated: 2019-08-15
XAI (Explainable AI)
Model Interpretability
Adversarial Example

Explanations can be manipulated and geometry is to blame

Authors: Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J. Anders, Marcel Ackermann, Klaus-Robert Müller, Pan Kessel | Published: 2019-06-19 | Updated: 2019-09-25
Model Interpretability
Robustness Evaluation
Attacks on Explainability