Label-Consistent Backdoor Attacks

TOP Literature Database Label-Consistent Backdoor Attacks

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/1912.02771

PDF

https://arxiv.org/pdf/1912.02771

Paper Information

Author: Alexander Turner,Dimitris Tsipras,Aleksander Madry
Published: 12-6-2019
Updated: 12-7-2019
Affiliation: MIT
Country: United States of America
Conference

Labels Estimated by AI

Backdoor Attack Poisoning Adversarial Example

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, they crucially rely on the adversary injecting arbitrary inputs that are---often blatantly---mislabeled. Such samples would raise suspicion upon human inspection, potentially revealing the attack. Thus, for backdoor attacks to remain undetected, it is crucial that they maintain label-consistency---the condition that injected inputs are consistent with their labels. In this work, we leverage adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks. Our approach is based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.

External Datasets

CIFAR-10

CINIC-10