High-performance Deep Neural Networks (DNNs) are increasingly deployed in
many real-world applications e.g., cloud prediction APIs. Recent advances in
model functionality stealing attacks via black-box access (i.e., inputs in,
predictions out) threaten the business model of such applications, which
require a lot of time, money, and effort to develop. Existing defenses take a
passive role against stealing attacks, such as by truncating predicted
information. We find such passive defenses ineffective against DNN stealing
attacks. In this paper, we propose the first defense which actively perturbs
predictions targeted at poisoning the training objective of the attacker. We
find our defense effective across a wide range of challenging datasets and DNN
model stealing attacks, and additionally outperforms existing defenses. Our
defense is the first that can withstand highly accurate model stealing attacks
for tens of thousands of queries, amplifying the attacker's error rate up to a
factor of 85$\times$ with minimal impact on the utility for benign users.