These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Deep neural networks have been demonstrated to be vulnerable to backdoor
attacks. Specifically, by injecting a small number of maliciously constructed
inputs into the training set, an adversary is able to plant a backdoor into the
trained model. This backdoor can then be activated during inference by a
backdoor trigger to fully control the model's behavior. While such attacks are
very effective, they crucially rely on the adversary injecting arbitrary inputs
that are---often blatantly---mislabeled. Such samples would raise suspicion
upon human inspection, potentially revealing the attack. Thus, for backdoor
attacks to remain undetected, it is crucial that they maintain
label-consistency---the condition that injected inputs are consistent with
their labels. In this work, we leverage adversarial perturbations and
generative models to execute efficient, yet label-consistent, backdoor attacks.
Our approach is based on injecting inputs that appear plausible, yet are hard
to classify, hence causing the model to rely on the (easier-to-learn) backdoor
trigger.