As the curation of data for machine learning becomes increasingly automated,
dataset tampering is a mounting threat. Backdoor attackers tamper with training
data to embed a vulnerability in models that are trained on that data. This
vulnerability is then activated at inference time by placing a "trigger" into
the model's input. Typical backdoor attacks insert the trigger directly into
the training data, although the presence of such an attack may be visible upon
inspection. In contrast, the Hidden Trigger Backdoor Attack achieves poisoning
without placing a trigger into the training data at all. However, this hidden
trigger attack is ineffective at poisoning neural networks trained from
scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs
gradient matching, data selection, and target model re-training during the
crafting process. Sleeper Agent is the first hidden trigger backdoor attack to
be effective against neural networks trained from scratch. We demonstrate its
effectiveness on ImageNet and in black-box settings. Our implementation code
can be found at https://github.com/hsouri/Sleeper-Agent.