AIセキュリティポータル K Program
Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory
Share
Abstract
Counterfactual explanations provide ways of achieving a favorable model outcome with minimum input perturbation. However, counterfactual explanations can also be leveraged to reconstruct the model by strategically training a surrogate model to give similar predictions as the original (target) model. In this work, we analyze how model reconstruction using counterfactuals can be improved by further leveraging the fact that the counterfactuals also lie quite close to the decision boundary. Our main contribution is to derive novel theoretical relationships between the error in model reconstruction and the number of counterfactual queries required using polytope theory. Our theoretical analysis leads us to propose a strategy for model reconstruction that we call Counterfactual Clamping Attack (CCA) which trains a surrogate model using a unique loss function that treats counterfactuals differently than ordinary instances. Our approach also alleviates the related problem of decision boundary shift that arises in existing model reconstruction approaches when counterfactuals are treated as ordinary instances. Experimental results demonstrate that our strategy improves fidelity between the target and surrogate model predictions on several datasets.
A. D. Alexandrov: Selected Works Part II: Intrinsic Geometry of Convex Surfaces
A. D. Aleksandrov
Published: 1967
Input convex neural networks
Brandon Amos, Lei Xu, J Zico Kolter
Published: 2017
Polytope approximation and the Mahler volume
S. Arya, G. D. Da Fonseca, D. M. Mount
Published: 2012
Cube slicing in Rn
K. Ball
Published: 1986
The hidden assumptions behind counterfactual explanations and principal reasons
S. Barocas, A. D. Selbst, M. Raghavan
Published: 2020
Spectrally-normalized margin bounds for neural networks
Bartlett, P.L., Foster, D.J., Telgarsky, M.J.
Published: 2017
Consistent counterfactuals for deep models
E. Black, Z. Wang, M. Fredrikson
Published: 2022
Approximation of smooth convex bodies by random circumscribed polytopes
K. Böröczky Jr, M. Reitzner
Published: 2004
Improved bounds on neural complexity for representing piecewise linear functions
K.-L. Chen, H. Garudadri, B. D. Rao
Published: 2022
Constraints-based explanations of classifications
D. Deutch, N. Frost
Published: 2019
Explanations based on the missing: Towards contrastive explanations with pertinent negatives
A. Dhurandhar, P. Y. Chen, R. Luss, C. C. Tu, P. Ting, K. Shanmugam, P. Das
Published: 2018
Robust counterfactual explanations for tree-based ensembles
S. Dutta, J. Long, S. Mishra, C. Tilli, D. Magazzeni
Published: 2022
Model extraction attacks and defenses on cloud-based machine learning models
Xueluan Gong, Qian Wang, Yanjiao Chen, Wang Yang, Xinchang Jiang
Published: 2020
Inversenet: Augmenting model extraction attacks with training data inversion
X. Gong, Y. Chen, W. Yang, G. Mei, Q. Wang
Published: 2021
Regularisation of neural networks by enforcing Lipschitz continuity
Henry Gouk, Eibe Frank, Bernhard Pfahringer, Michael J Cree
Published: 2021
Counterfactual explanations and how to find them: Literature review and benchmarking
R. Guidotti
Published: 2022
Robust counterfactual explanations for neural networks with probabilistic guarantees
F. Hamman, E. Noorani, S. Mishra, D. Magazzeni, S. Dutta
Published: 2023
Complexity of linear regions in deep networks
B. Hanin, D. Rolnick
Published: 2019
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Published: 2016
High Accuracy and High Fidelity Extraction of Neural Networks
Matthew Jagielski, Nicholas Carlini, David Berthelot, Alex Kurakin, Nicolas Papernot
Published: 9.4.2019
Formalising the robustness of counterfactual explanations for neural networks
J. Jiang, F. Leofante, A. Rago, F. Toni
Published: 2023
Model-agnostic counterfactual explanations for consequential decisions
A.-H. Karimi, G. Barthe, B. Balle, I. Valera
Published: 2020
A survey of algorithmic recourse: Contrastive explanations and consequential recommendations
A.-H. Karimi, G. Barthe, B. Schölkopf, I. Valera
Published: 2022
Manifolds and Differential Geometry
J. M. Lee
Published: 2009
Certified monotonic neural networks
X. Liu, X. Han, N. Zhang, Q. Liu
Published: 2020
Rectifier nonlinearities improve neural network acoustic models
Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng
Published: 2013
Explanations for monotonic classifiers
J. Marques-Silva, T. Gerspacher, M. C. Cooper, A. Ignatiev, N. Narodytska
Published: 2021
Explaining machine learning classifiers through diverse counterfactual explanations
R. K. Mothilal, A. Sharma, C. Tan
Published: 2020
Learning classification with auxiliary probabilistic information
Q. Nguyen, H. Valizadegan, M. Hauskrecht
Published: 2011
Sample-efficient learning with auxiliary class-label information
Q. Nguyen, H. Valizadegan, A. Seybert, M. Hauskrecht
Published: 2011
I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences
Daryna Oliynyk, Rudolf Mayer, Andreas Rauber
Published: 6.17.2022
Activethief: Model extraction using active learning and unannotated public data
S. Pal, Y. Gupta, A. Shukla, A. Kanade, S. K. Shevade, V. Ganapathy
Published: 2020
Practical black-box attacks against machine learning
N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami
Published: 2017
Training robust neural networks using Lipschitz bounds
P. Pauli, A. Koch, J. Berberich, P. Kohler, F. Allgöwer
Published: 2021
Learning model-agnostic counterfactual explanations for tabular data
M. Pawelczyk, K. Broelemann, G. Kasneci
Published: 2020
On the privacy risks of algorithmic recourse
M. Pawelczyk, H. Lakkaraju, S. Neel
Published: 2023
U-net: Convolutional networks for biomedical image segmentation
O. Ronneberger, P. Fischer, T. Brox
Published: 2015
Plug & play attacks: Towards robust and flexible model inversion attacks
Lukas Struppek, Dominik Hintersdorf, Antonio De Almeida Correira, Antonia Adler, Kristian Kersting
Published: 2022
Stealing machine learning models via prediction APIs
F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, T. Ristenpart
Published: 2016
Towards robust and reliable algorithmic recourse
S. Upadhyay, S. Joshi, H. Lakkaraju
Published: 2021
Counterfactual explanations without opening the black box: Automated decisions and the GDPR
S. Wachter, B. Mittelstadt, C. Russell
Published: 2017
DualCF: Efficient Model Extraction Attack from Counterfactual Explanations
Yongjie Wang, Hangwei Qian, Chunyan Miao
Published: 5.13.2022
On rectified linear units for speech processing
M. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G. Hinton
Published: 2013
Exploiting Explanations for Model Inversion Attacks
Xuejun Zhao, Wencan Zhang, Xiaokui Xiao, Brian Y. Lim
Published: 4.27.2021
Share