Transferable Perturbations of Deep Feature Distributions

TOP Literature Database Transferable Perturbations of Deep Feature Distributions

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2004.12519

PDF

https://arxiv.org/pdf/2004.12519

Paper Information

Author: Nathan Inkawhich,Kevin J Liang,Lawrence Carin,Yiran Chen
Published: 4-27-2020
Affiliation: Department of Electrical and Computer Engineering, Duke University
Country: United States of America
Conference: International Conference on Learning Representations (ICLR)

Labels Estimated by AI

Adversarial Attack Methods Deep Learning Technology Multi-Class Classification

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended ImageNet models. Further, we place a priority on explainability and interpretability of the attacking process. Our methodology affords an analysis of how adversarial attacks change the intermediate feature distributions of CNNs, as well as a measure of layer-wise and class-wise feature distributional separability/entanglement. We also conceptualize a transition from task/data-specific to model-specific features within a CNN architecture that directly impacts the transferability of adversarial examples.

External Datasets

ImageNet-1K training set

ImageNet-1K validation set