Generalizable Adversarial Attacks with Latent Variable Perturbation Modelling

Authors: Avishek Joey Bose, Andre Cianflone, William L. Hamilton | Published: 2019-05-26 | Updated: 2020-01-20

2019.05.262025.05.28

Authors: Avishek Joey Bose, Andre Cianflone, William L. Hamilton
Published: 2019-05-26 | Updated: 2020-01-20

Source: https://arxiv.org/abs/1905.10864

PDF: https://arxiv.org/pdf/1905.10864

Labels Predicted by AI

Vulnerability of Adversarial Examples Adversarial Example Impact of Generalization

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Adversarial attacks on deep neural networks traditionally rely on a constrained optimization paradigm, where an optimization procedure is used to obtain a single adversarial perturbation for a given input example. In this work we frame the problem as learning a distribution of adversarial perturbations, enabling us to generate diverse adversarial distributions given an unperturbed input. We show that this framework is domain-agnostic in that the same framework can be employed to attack different input domains with minimal modification. Across three diverse domains—images, text, and graphs—our approach generates whitebox attacks with success rates that are competitive with or superior to existing approaches, with a new state-of-the-art achieved in the graph domain. Finally, we demonstrate that our framework can efficiently generate a diverse set of attacks for a single given input, and is even capable of attacking unseen test instances in a zero-shot manner, exhibiting attack generalization.