Backdoor data poisoning attacks have recently been demonstrated in computer
vision research as a potential safety risk for machine learning (ML) systems.
Traditional data poisoning attacks manipulate training data to induce
unreliability of an ML model, whereas backdoor data poisoning attacks maintain
system performance unless the ML model is presented with an input containing an
embedded "trigger" that provides a predetermined response advantageous to the
adversary. Our work builds upon prior backdoor data-poisoning research for ML
image classifiers and systematically assesses different experimental conditions
including types of trigger patterns, persistence of trigger patterns during
retraining, poisoning strategies, architectures (ResNet-50, NasNet,
NasNet-Mobile), datasets (Flowers, CIFAR-10), and potential defensive
regularization techniques (Contrastive Loss, Logit Squeezing, Manifold Mixup,
Soft-Nearest-Neighbors Loss). Experiments yield four key findings. First, the
success rate of backdoor poisoning attacks varies widely, depending on several
factors, including model architecture, trigger pattern and regularization
technique. Second, we find that poisoned models are hard to detect through
performance inspection alone. Third, regularization typically reduces backdoor
success rate, although it can have no effect or even slightly increase it,
depending on the form of regularization. Finally, backdoors inserted through
data poisoning can be rendered ineffective after just a few epochs of
additional training on a small set of clean data without affecting the model's
performance.