Built-in Vulnerabilities to Imperceptible Adversarial Perturbations

TOP Literature Database Built-in Vulnerabilities to Imperceptible Adversarial Perturbations

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/1806.07409

PDF

https://arxiv.org/pdf/1806.07409

Paper Information

Author: Thomas Tanay,Jerone T. A. Andrews,Lewis D. Griffin
Published: 6-20-2018
Updated: 5-8-2019
Affiliation: CoMPLEX, Dept. of Computer Science, University College London
Country: United Kingdom
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Certified Robustness Adversarial Training Adversarial Learning

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Designing models that are robust to small adversarial perturbations of their inputs has proven remarkably difficult. In this work we show that the reverse problem---making models more vulnerable---is surprisingly easy. After presenting some proofs of concept on MNIST, we introduce a generic tilting attack that injects vulnerabilities into the linear layers of pre-trained networks by increasing their sensitivity to components of low variance in the training data without affecting their performance on test data. We illustrate this attack on a multilayer perceptron trained on SVHN and use it to design a stand-alone adversarial module which we call a steganogram decoder. Finally, we show on CIFAR-10 that a poisoning attack with a poisoning rate as low as 0.1% can induce vulnerabilities to chosen imperceptible backdoor signals in state-of-the-art networks. Beyond their practical implications, these different results shed new light on the nature of the adversarial example phenomenon.