These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The state of the art performance of deep learning models comes at a high cost
for companies and institutions, due to the tedious data collection and the
heavy processing requirements. Recently, [35, 22] proposed to watermark
convolutional neural networks for image classification, by embedding
information into their weights. While this is a clear progress towards model
protection, this technique solely allows for extracting the watermark from a
network that one accesses locally and entirely.
Instead, we aim at allowing the extraction of the watermark from a neural
network (or any other machine learning model) that is operated remotely, and
available through a service API. To this end, we propose to mark the model's
action itself, tweaking slightly its decision frontiers so that a set of
specific queries convey the desired information. In the present paper, we
formally introduce the problem and propose a novel zero-bit watermarking
algorithm that makes use of adversarial model examples. While limiting the loss
of performance of the protected model, this algorithm allows subsequent
extraction of the watermark using only few queries. We experimented the
approach on three neural networks designed for image classification, in the
context of MNIST digit recognition task.