A principled approach for generating adversarial images under non-smooth dissimilarity metrics

Authors: Aram-Alexandre Pooladian, Chris Finlay, Tim Hoheisel, Adam Oberman | Published: 2019-08-05 | Updated: 2019-10-08

2019.08.052025.04.03

Authors: Aram-Alexandre Pooladian, Chris Finlay, Tim Hoheisel, Adam Oberman
Published: 2019-08-05 | Updated: 2019-10-08

Source: https://arxiv.org/abs/1908.01667

PDF: https://arxiv.org/pdf/1908.01667

AIにより推定されたラベル

敵対的攻撃手法攻撃の評価堅牢性向上手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Deep neural networks perform well on real world data but are prone to adversarial perturbations: small changes in the input easily lead to misclassification. In this work, we propose an attack methodology not only for cases where the perturbations are measured by ℓ_p norms, but in fact any adversarial dissimilarity metric with a closed proximal form. This includes, but is not limited to, ℓ₁, ℓ₂, and ℓ_∞ perturbations; the ℓ₀ counting “norm” (i.e. true sparseness); and the total variation seminorm, which is a (non-ℓ_p) convolutional dissimilarity measuring local pixel changes. Our approach is a natural extension of a recent adversarial attack method, and eliminates the differentiability requirement of the metric. We demonstrate our algorithm, ProxLogBarrier, on the MNIST, CIFAR10, and ImageNet-1k datasets. We consider undefended and defended models, and show that our algorithm easily transfers to various datasets. We observe that ProxLogBarrier outperforms a host of modern adversarial attacks specialized for the ℓ₀ case. Moreover, by altering images in the total variation seminorm, we shed light on a new class of perturbations that exploit neighboring pixel information.