Defending Against Adversarial Attacks by Leveraging an Entire GAN

TOP 文献データベース Defending Against Adversarial Attacks by Leveraging an Entire GAN

Computing Research Repository (CoRR)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1805.10652

PDF

https://arxiv.org/pdf/1805.10652

文献情報

作者: Gokula Krishnan Santhanam,Paulina Grnarova
公開日: 2025-3-25
所属機関: Department of Computer Science, ETH Zurich
所属の国: Switzerland
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

トリガーの検知敵対的攻撃検出モデルの堅牢性

Abstract

Recent work has shown that state-of-the-art models are highly vulnerable to adversarial perturbations of the input. We propose cowboy, an approach to detecting and defending against adversarial attacks by using both the discriminator and generator of a GAN trained on the same dataset. We show that the discriminator consistently scores the adversarial samples lower than the real samples across multiple attacks and datasets. We provide empirical evidence that adversarial samples lie outside of the data manifold learned by the GAN. Based on this, we propose a cleaning method which uses both the discriminator and generator of the GAN to project the samples back onto the data manifold. This cleaning procedure is independent of the classifier and type of attack and thus can be deployed in existing systems.