These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Recent work has extensively shown that randomized perturbations of neural
networks can improve robustness to adversarial attacks. The literature is,
however, lacking a detailed compare-and-contrast of the latest proposals to
understand what classes of perturbations work, when they work, and why they
work. We contribute a detailed evaluation that elucidates these questions and
benchmarks perturbation based defenses consistently. In particular, we show
five main results: (1) all input perturbation defenses, whether random or
deterministic, are equivalent in their efficacy, (2) attacks transfer between
perturbation defenses so the attackers need not know the specific type of
defense -- only that it involves perturbations, (3) a tuned sequence of noise
layers across a network provides the best empirical robustness, (4)
perturbation based defenses offer almost no robustness to adaptive attacks
unless these perturbations are observed during training, and (5) adversarial
examples in a close neighborhood of original inputs show an elevated
sensitivity to perturbations in first and second-order analyses.