These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Train-time data poisoning attacks threaten machine learning models by
introducing adversarial examples during training, leading to misclassification.
Current defense methods often reduce generalization performance, are
attack-specific, and impose significant training overhead. To address this, we
introduce a set of universal data purification methods using a stochastic
transform, $\Psi(x)$, realized via iterative Langevin dynamics of Energy-Based
Models (EBMs), Denoising Diffusion Probabilistic Models (DDPMs), or both. These
approaches purify poisoned data with minimal impact on classifier
generalization. Our specially trained EBMs and DDPMs provide state-of-the-art
defense against various attacks (including Narcissus, Bullseye Polytope,
Gradient Matching) on CIFAR-10, Tiny-ImageNet, and CINIC-10, without needing
attack or classifier-specific information. We discuss performance trade-offs
and show that our methods remain highly effective even with poisoned or
distributionally shifted generative model training data.