These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Black-Box attacks on machine learning models occur when an attacker, despite
having no access to the inner workings of a model, can successfully craft an
attack by means of model theft. The attacker will train an own substitute model
that mimics the model to be attacked. The substitute can then be used to design
attacks against the original model, for example by means of adversarial
samples. We put ourselves in the shoes of the defender and present a method
that can successfully avoid model theft by mounting a counter-attack.
Specifically, to any incoming query, we slightly perturb our output label
distribution in a way that makes substitute training infeasible. We demonstrate
that the perturbation does not affect the ordinary use of our model, but
results in an effective defense against attacks based on model theft.