Deep neural networks have been demonstrated to be vulnerable to backdoor
attacks. Specifically, by injecting a small number of maliciously constructed
inputs into the training set, an adversary is able to plant a backdoor into the
trained model. This backdoor can then be activated during inference by a
backdoor trigger to fully control the model's behavior. While such attacks are
very effective, they crucially rely on the adversary injecting arbitrary inputs
that are---often blatantly---mislabeled. Such samples would raise suspicion
upon human inspection, potentially revealing the attack. Thus, for backdoor
attacks to remain undetected, it is crucial that they maintain
label-consistency---the condition that injected inputs are consistent with
their labels. In this work, we leverage adversarial perturbations and
generative models to execute efficient, yet label-consistent, backdoor attacks.
Our approach is based on injecting inputs that appear plausible, yet are hard
to classify, hence causing the model to rely on the (easier-to-learn) backdoor
trigger.