Generating Semantic Adversarial Examples via Feature Manipulation

TOP 文献データベース Generating Semantic Adversarial Examples via Feature Manipulation

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2001.02297

PDF

https://arxiv.org/pdf/2001.02297

文献情報

作者: Shuo Wang;Surya Nepal;Carsten Rudolph;Marthie Grobler;Shangyu Chen;Tianle Chen
公開日: 2020-1-6
更新日: 2022-5-20
所属機関: CSIRO’s Data61 and Cybersecurity CRC
所属の国: Australia
会議名

AIにより推定されたラベル

敵対的サンプル敵対的学習データ生成

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

The vulnerability of deep neural networks to adversarial attacks has been widely demonstrated (e.g., adversarial example attacks). Traditional attacks perform unstructured pixel-wise perturbation to fool the classifier. An alternative approach is to have perturbations in the latent space. However, such perturbations are hard to control due to the lack of interpretability and disentanglement. In this paper, we propose a more practical adversarial attack by designing structured perturbation with semantic meanings. Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes. The intuition behind our technique is that images in similar domains have some commonly shared but theme-independent semantic attributes, e.g. thickness of lines in handwritten digits, that can be bidirectionally mapped to disentangled latent codes. We generate adversarial perturbation by manipulating a single or a combination of these latent codes and propose two unsupervised semantic manipulation approaches: vector-based disentangled representation and feature map-based disentangled representation, in terms of the complexity of the latent codes and smoothness of the reconstructed images. We conduct extensive experimental evaluations on real-world image data to demonstrate the power of our attacks for black-box classifiers. We further demonstrate the existence of a universal, image-agnostic semantic adversarial example.