Machine learning (ML) applications are increasingly prevalent. Protecting the
confidentiality of ML models becomes paramount for two reasons: (a) a model can
be a business advantage to its owner, and (b) an adversary may use a stolen
model to find transferable adversarial examples that can evade classification
by the original model. Access to the model can be restricted to be only via
well-defined prediction APIs. Nevertheless, prediction APIs still provide
enough information to allow an adversary to mount model extraction attacks by
sending repeated queries via the prediction API. In this paper, we describe new
model extraction attacks using novel approaches for generating synthetic
queries, and optimizing training hyperparameters. Our attacks outperform
state-of-the-art model extraction in terms of transferability of both targeted
and non-targeted adversarial examples (up to +29-44 percentage points, pp), and
prediction accuracy (up to +46 pp) on two datasets. We provide take-aways on
how to perform effective model extraction attacks. We then propose PRADA, the
first step towards generic and effective detection of DNN model extraction
attacks. It analyzes the distribution of consecutive API queries and raises an
alarm when this distribution deviates from benign behavior. We show that PRADA
can detect all prior model extraction attacks with no false positives.