Learning Universal Adversarial Perturbations with Generative Models

Authors: Jamie Hayes, George Danezis | Published: 2017-08-17 | Updated: 2018-01-05

2017.08.172025.04.03

Authors: Jamie Hayes, George Danezis
Published: 2017-08-17 | Updated: 2018-01-05

Source: https://arxiv.org/abs/1708.05207

PDF: https://arxiv.org/pdf/1708.05207

AIにより推定されたラベル

攻撃手法モデルの頑健性保証敵対的サンプル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Neural networks are known to be vulnerable to adversarial examples, inputs that have been intentionally perturbed to remain visually similar to the source input, but cause a misclassification. It was recently shown that given a dataset and classifier, there exists so called universal adversarial perturbations, a single perturbation that causes a misclassification when applied to any input. In this work, we introduce universal adversarial networks, a generative network that is capable of fooling a target classifier when it’s generated output is added to a clean sample from a dataset. We show that this technique improves on known universal adversarial attacks.