Self-supervised learning is an emerging machine learning paradigm. Compared
to supervised learning which leverages high-quality labeled datasets,
self-supervised learning relies on unlabeled datasets to pre-train powerful
encoders which can then be treated as feature extractors for various downstream
tasks. The huge amount of data and computational resources consumption makes
the encoders themselves become the valuable intellectual property of the model
owner. Recent research has shown that the machine learning model's copyright is
threatened by model stealing attacks, which aim to train a surrogate model to
mimic the behavior of a given model. We empirically show that pre-trained
encoders are highly vulnerable to model stealing attacks. However, most of the
current efforts of copyright protection algorithms such as watermarking
concentrate on classifiers. Meanwhile, the intrinsic challenges of pre-trained
encoder's copyright protection remain largely unstudied. We fill the gap by
proposing SSLGuard, the first watermarking scheme for pre-trained encoders.
Given a clean pre-trained encoder, SSLGuard injects a watermark into it and
outputs a watermarked version. The shadow training technique is also applied to
preserve the watermark under potential model stealing attacks. Our extensive
evaluation shows that SSLGuard is effective in watermark injection and
verification, and it is robust against model stealing and other watermark
removal attacks such as input noising, output perturbing, overwriting, model
pruning, and fine-tuning.