These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The growing dependence on machine learning in real-world applications
emphasizes the importance of understanding and ensuring its safety. Backdoor
attacks pose a significant security risk due to their stealthy nature and
potentially serious consequences. Such attacks involve embedding triggers
within a learning model with the intention of causing malicious behavior when
an active trigger is present while maintaining regular functionality without
it. This paper evaluates the effectiveness of any backdoor attack incorporating
a constant trigger, by establishing tight lower and upper boundaries for the
performance of the compromised model on both clean and backdoor test data. The
developed theory answers a series of fundamental but previously underexplored
problems, including (1) what are the determining factors for a backdoor
attack's success, (2) what is the direction of the most effective backdoor
attack, and (3) when will a human-imperceptible trigger succeed. Our derived
understanding applies to both discriminative and generative models. We also
demonstrate the theory by conducting experiments using benchmark datasets and
state-of-the-art backdoor attack scenarios.