Generalizable Adversarial Attacks with Latent Variable Perturbation Modelling

AIにより推定されたラベル
Abstract

Adversarial attacks on deep neural networks traditionally rely on a constrained optimization paradigm, where an optimization procedure is used to obtain a single adversarial perturbation for a given input example. In this work we frame the problem as learning a distribution of adversarial perturbations, enabling us to generate diverse adversarial distributions given an unperturbed input. We show that this framework is domain-agnostic in that the same framework can be employed to attack different input domains with minimal modification. Across three diverse domains—images, text, and graphs—our approach generates whitebox attacks with success rates that are competitive with or superior to existing approaches, with a new state-of-the-art achieved in the graph domain. Finally, we demonstrate that our framework can efficiently generate a diverse set of attacks for a single given input, and is even capable of attacking unseen test instances in a zero-shot manner, exhibiting attack generalization.

タイトルとURLをコピーしました