Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

27th USENIX Security Symposium (USENIX Security)

Turning your weakness into a strength: Watermarking deep neural networks by backdooring

Y. Adi, C. Baum, M. Cisse, B. Pinkas, J. Keshet

Published: 2018

AAAI Conference on Artificial Intelligence

On lipschitz regularization of convolutional layers using toeplitz matrix theory

Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif

Published: 2021

International Conference on Machine Learning

Certified neural network watermarks with randomized smoothing

Arpit Bansal, Ping-yeh Chiang, Michael J Curry, Rajiv Jain, Curtis Wigington, Varun Manjunatha, John P Dickerson, Tom Goldstein

Published: 2022

OpenAI Technical Report

Language models are few-shot learners

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei

Published: 2020

arxiv

Cited by 1

IEEE Symposium on Security and Privacy

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, David Wagner

Published: 8.17.2016

Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0.5\%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100\%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

Model Robustness Adversarial Example Certified Robustness

Proceedings of the IEEE/CVF International Conference on Computer Vision

Advdiffuser: Natural adversarial example synthesis with diffusion models

Xinquan Chen, Xitong Gao, Juanjuan Zhao, Kejiang Ye, Cheng-Zhong Xu

Published: 2023

Advances in Neural Information Processing Systems

Content-based unrestricted adversarial attack

Zhaoyu Chen, Bo Li, Shuang Wu, Kaixun Jiang, Shouhong Ding, Wenqiang Zhang

Published: 2024

ICML

Certified adversarial robustness via randomized smoothing

J. Cohen, E. Rosenfeld, Z. Kolter