These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Existing work in trustworthy machine learning primarily focuses on
single-input adversarial perturbations. In many real-world attack scenarios,
input-agnostic adversarial attacks, e.g. universal adversarial perturbations
(UAPs), are much more feasible. Current certified training methods train models
robust to single-input perturbations but achieve suboptimal clean and UAP
accuracy, thereby limiting their applicability in practical applications. We
propose a novel method, CITRUS, for certified training of networks robust
against UAP attackers. We show in an extensive evaluation across different
datasets, architectures, and perturbation magnitudes that our method
outperforms traditional certified training methods on standard accuracy (up to
10.3\%) and achieves SOTA performance on the more practical certified UAP
accuracy metric.
External Datasets
MNIST
CIFAR-10
TinyImageNet
References
International conference on machine learning
Synthesizing robust adversarial examples
Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.
Published: 2018
8th International Conference on Learning Representations (ICLR 2020)(virtual)
Adversarial training and provable defenses: Bridging the gap
2021 IEEE International Conference on Multimedia and Expo (ICME)
Universal adversarial training with class-wise perturbations
Benz, P., Zhang, C., Karjauv, A., Kweon, I.S.
Published: 2021
The Eleventh International Conference on Learning Representations, ICLR
(certified!!) adversarial robustness for free!
Carlini, N., Tramer, F., Dvijotham, K. D., Rice, L., Sun, ` M., Kolter, J. Z.
Published: 2023
arxiv
Cited by 1
IEEE Symposium on Security and Privacy
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini, David Wagner
Published: 8.17.2016
Neural networks provide state-of-the-art results for most machine learning
tasks. Unfortunately, neural networks are vulnerable to adversarial examples:
given an input $x$ and any target classification $t$, it is possible to find a
new input $x'$ that is similar to $x$ but classified as $t$. This makes it
difficult to apply neural networks in security-critical areas. Defensive
distillation is a recently proposed approach that can take an arbitrary neural
network, and increase its robustness, reducing the success rate of current
attacks' ability to find adversarial examples from $95\%$ to $0.5\%$.
In this paper, we demonstrate that defensive distillation does not
significantly increase the robustness of neural networks by introducing three
new attack algorithms that are successful on both distilled and undistilled
neural networks with $100\%$ probability. Our attacks are tailored to three
distance metrics used previously in the literature, and when compared to
previous adversarial example generation algorithms, our attacks are often much
more effective (and never worse). Furthermore, we propose using high-confidence
adversarial examples in a simple transferability test we show can also be used
to break defensive distillation. We hope our attacks will be used as a
benchmark in future defense attempts to create neural networks that resist
adversarial examples.