These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
We present BadGD, a unified theoretical framework that exposes the
vulnerabilities of gradient descent algorithms through strategic backdoor
attacks. Backdoor attacks involve embedding malicious triggers into a training
dataset to disrupt the model's learning process. Our framework introduces three
novel constructs: Max RiskWarp Trigger, Max GradWarp Trigger, and Max
GradDistWarp Trigger, each designed to exploit specific aspects of gradient
descent by distorting empirical risk, deterministic gradients, and stochastic
gradients respectively. We rigorously define clean and backdoored datasets and
provide mathematical formulations for assessing the distortions caused by these
malicious backdoor triggers. By measuring the impact of these triggers on the
model training procedure, our framework bridges existing empirical findings
with theoretical insights, demonstrating how a malicious party can exploit
gradient descent hyperparameters to maximize attack effectiveness. In
particular, we show that these exploitations can significantly alter the loss
landscape and gradient calculations, leading to compromised model integrity and
performance. This research underscores the severe threats posed by such
data-centric attacks and highlights the urgent need for robust defenses in
machine learning. BadGD sets a new standard for understanding and mitigating
adversarial manipulations, ensuring the reliability and security of AI systems.