These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Designing models that are robust to small adversarial perturbations of their
inputs has proven remarkably difficult. In this work we show that the reverse
problem---making models more vulnerable---is surprisingly easy. After
presenting some proofs of concept on MNIST, we introduce a generic tilting
attack that injects vulnerabilities into the linear layers of pre-trained
networks by increasing their sensitivity to components of low variance in the
training data without affecting their performance on test data. We illustrate
this attack on a multilayer perceptron trained on SVHN and use it to design a
stand-alone adversarial module which we call a steganogram decoder. Finally, we
show on CIFAR-10 that a poisoning attack with a poisoning rate as low as 0.1%
can induce vulnerabilities to chosen imperceptible backdoor signals in
state-of-the-art networks. Beyond their practical implications, these different
results shed new light on the nature of the adversarial example phenomenon.