These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
With the rise of Machine Learning as a Service (MLaaS) platforms,safeguarding
the intellectual property of deep learning models is becoming paramount. Among
various protective measures, trigger set watermarking has emerged as a flexible
and effective strategy for preventing unauthorized model distribution. However,
this paper identifies an inherent flaw in the current paradigm of trigger set
watermarking: evasion adversaries can readily exploit the shortcuts created by
models memorizing watermark samples that deviate from the main task
distribution, significantly impairing their generalization in adversarial
settings. To counteract this, we leverage diffusion models to synthesize
unrestricted adversarial examples as trigger sets. By learning the model to
accurately recognize them, unique watermark behaviors are promoted through
knowledge injection rather than error memorization, thus avoiding exploitable
shortcuts. Furthermore, we uncover that the resistance of current trigger set
watermarking against removal attacks primarily relies on significantly damaging
the decision boundaries during embedding, intertwining unremovability with
adverse impacts. By optimizing the knowledge transfer properties of protected
models, our approach conveys watermark behaviors to extraction surrogates
without aggressively decision boundary perturbation. Experimental results on
CIFAR-10/100 and Imagenette datasets demonstrate the effectiveness of our
method, showing not only improved robustness against evasion adversaries but
also superior resistance to watermark removal attacks compared to
state-of-the-art solutions.
External Datasets
CIFAR-10
CIFAR-100
Imagenette
References
27th USENIX Security Symposium (USENIX Security)
Turning your weakness into a strength: Watermarking deep neural networks by backdooring
Y. Adi, C. Baum, M. Cisse, B. Pinkas, J. Keshet
Published: 2018
AAAI Conference on Artificial Intelligence
On lipschitz regularization of convolutional layers using toeplitz matrix theory
Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif
Published: 2021
International Conference on Machine Learning
Certified neural network watermarks with randomized smoothing
Arpit Bansal, Ping-yeh Chiang, Michael J Curry, Rajiv Jain, Curtis Wigington, Varun Manjunatha, John P Dickerson, Tom Goldstein
Published: 2022
OpenAI Technical Report
Language models are few-shot learners
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei
Published: 2020
arxiv
Cited by 1
IEEE Symposium on Security and Privacy
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini, David Wagner
Published: 8.17.2016
Neural networks provide state-of-the-art results for most machine learning
tasks. Unfortunately, neural networks are vulnerable to adversarial examples:
given an input $x$ and any target classification $t$, it is possible to find a
new input $x'$ that is similar to $x$ but classified as $t$. This makes it
difficult to apply neural networks in security-critical areas. Defensive
distillation is a recently proposed approach that can take an arbitrary neural
network, and increase its robustness, reducing the success rate of current
attacks' ability to find adversarial examples from $95\%$ to $0.5\%$.
In this paper, we demonstrate that defensive distillation does not
significantly increase the robustness of neural networks by introducing three
new attack algorithms that are successful on both distilled and undistilled
neural networks with $100\%$ probability. Our attacks are tailored to three
distance metrics used previously in the literature, and when compared to
previous adversarial example generation algorithms, our attacks are often much
more effective (and never worse). Furthermore, we propose using high-confidence
adversarial examples in a simple transferability test we show can also be used
to break defensive distillation. We hope our attacks will be used as a
benchmark in future defense attempts to create neural networks that resist
adversarial examples.