Post-hoc explanation techniques refer to a posteriori methods that can be
used to explain how black-box machine learning models produce their outcomes.
Among post-hoc explanation techniques, counterfactual explanations are becoming
one of the most popular methods to achieve this objective. In particular, in
addition to highlighting the most important features used by the black-box
model, they provide users with actionable explanations in the form of data
instances that would have received a different outcome. Nonetheless, by doing
so, they also leak non-trivial information about the model itself, which raises
privacy issues. In this work, we demonstrate how an adversary can leverage the
information provided by counterfactual explanations to build high-fidelity and
high-accuracy model extraction attacks. More precisely, our attack enables the
adversary to build a faithful copy of a target model by accessing its
counterfactual explanations. The empirical evaluation of the proposed attack on
black-box models trained on real-world datasets demonstrates that they can
achieve high-fidelity and high-accuracy extraction even under low query
budgets.