Masked Language Model Based Textual Adversarial Example Detection

ICCV

Adversarial Example Detection Using Latent Neighborhood Graph

Ahmed A. Abusnaina, Yuhang Wu, Sunpreet S. Arora, Yizhen Wang, Fei Wang, Hao Yang, David A. Mohaisen

Published: 2021

Toward Mitigating Adversarial Texts

Basemah Alshemali, Jugal Kumar Kalita

Published: 2019

EMNLP

Generating Natural Language Adversarial Examples

Moustafa Farid Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani B. Srivastava, Kai-Wei Chang

Published: 2018

Oakland

Membership Inference Attacks From First Principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, A. Terzis, Florian Tramèr

Published: 2022

AISec

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Nicholas Carlini, David A. Wagner

Published: 2017

NIPS

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, P. Abbeel

Published: 2016

Proceedings of NAACL-HLT

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Published: 2019

ICLR

Towards Robustness Against Natural Language Word Substitutions

Xinshuai Dong, Anh Tuan Luu, Rongrong Ji, Hong Liu

Published: 2021

ACL

HotFlip: White-Box Adversarial Examples for Text Classification

J. Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou

Published: 2018

EMNLP

Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems

Steffen Eger, Gözde Gül Sahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych

Published: 2019

2018 IEEE Security and Privacy Workshops (SPW)

Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi

Published: 2018

EMNLP

BAE: BERT-based Adversarial Examples for Text Classification

Siddhant Garg, Goutham Ramakrishnan

Published: 2020

arXiv preprint

Adversarial spheres

Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow

Published: 2018

NIPS

Generative Adversarial Nets

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio

Published: 2014

ICLR

Explaining and harnessing adversarial examples

Goodfellow, I. J., Shlens, J., Szegedy, C.

Published: 2015

ACL

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith

Published: 2020

ACL

Pretrained Transformers Improve Out-of-Distribution Robustness

Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, Dawn Xiaodong Song

Published: 2020

Neural computation

Long short-term memory

S. Hochreiter, J. Schmidhuber

Published: 1997

NAACL

Adversarial Example Generation with Syntactically Controlled Paraphrase Networks

Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer

Published: 2018

AAAI

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits

Published: 2020

International Conference on Learning Representations

Albert: A lite bert for self-supervised learning of language representations

Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut

Published: 2020

NAACL

Contextualized Perturbation for Textual Adversarial Attack

Dianqi Li, Yizhe Zhang, Hao Peng, Liqun Chen, Chris Brockett, Ming-Ting Sun, Bill Dolan

Published: 2021

NDSS

TextBugger: Generating Adversarial Text Against Real-world Applications

Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang

Published: 2019

EMNLP

BERT-ATTACK: Adversarial Attack against BERT Using BERT

Linyang Li, Ruotian Ma, Qipeng Guo, X. Xue, Xipeng Qiu

Published: 2020

Roberta: A robustly optimized bert pretraining approach

Liu, Y.

Published: 2019

Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Learning word vectors for sentiment analysis

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, Christopher Potts

Published: 2011

AAAI

Generating Natural Language Attacks in a Hard Label Black Box Setting

Rishabh Maheshwary, Saket Maheshwary, Vikram Pudi

Published: 2021

EMNLP

A Strong Baseline for Query Efficient Attacks in a Black Box Setting

Rishabh Maheshwary, Saket Maheshwary, Vikram Pudi

Published: 2021

arxiv

Cited by 1

Annual ACM Conference on Computer and Communications Security (CCS)

MagNet: a Two-Pronged Defense against Adversarial Examples

Dongyu Meng, Hao Chen

Published: 5.25.2017

Deep learning has shown promising results on hard perceptual problems in recent years. However, deep learning systems are found to be vulnerable to small adversarial perturbations that are nearly imperceptible to human. Such specially crafted perturbations cause deep learning systems to output incorrect decisions, with potentially disastrous consequences. These vulnerabilities hinder the deployment of deep learning systems where safety or security is important. Attempts to secure deep learning systems either target specific attacks or have been shown to be ineffective. In this paper, we propose MagNet, a framework for defending neural network classifiers against adversarial examples. MagNet does not modify the protected classifier or know the process for generating adversarial examples. MagNet includes one or more separate detector networks and a reformer network. Different from previous work, MagNet learns to differentiate between normal and adversarial examples by approximating the manifold of normal examples. Since it does not rely on any process for generating adversarial examples, it has substantial generalization power. Moreover, MagNet reconstructs adversarial examples by moving them towards the manifold, which is effective for helping classify adversarial examples with small perturbation correctly. We discuss the intrinsic difficulty in defending against whitebox attack and propose a mechanism to defend against graybox attack. Inspired by the use of randomness in cryptography, we propose to use diversity to strengthen MagNet. We show empirically that MagNet is effective against most advanced state-of-the-art attacks in blackbox and graybox scenarios while keeping false positive rate on normal examples very low.

Effectiveness Analysis of Defense Methods Adversarial Example Detection Attack Type

KDD

GradMask: Gradient-Guided Token Masking for Textual Adversarial Example Detection

Han Cheol Moon, Shafiq R. Joty, Xu Chi

Published: 2022

EMNLP

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

John X. Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, Yanjun Qi

Published: 2020

ACL

“That Is a Suspicious Reaction!”: Interpreting Logits Variation to Detect NLP Adversarial Attacks

Edoardo Mosca, Shreyash Agarwal, Javier Rando-Ramirez, George Louis Groh

Published: 2022

EACL

Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

Maximilian Mozes, Pontus Stenetorp, Bennett Kleinberg, Lewis D. Griffin

Published: 2021

EMNLP

SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness

Nathan Ng, Kyunghyun Cho, Marzyeh Ghassemi

Published: 2020

EMNLP

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Dang Minh Nguyen, Anh Tuan Luu

Published: 2022

ACL

Combating Adversarial Misspellings with Robust Word Recognition

Danish Pruthi, Bhuwan Dhingra, Zachary Chase Lipton

Published: 2019

Annual Meeting of the Association for Computational Linguistics (ACL)

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency

Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che

Published: 2019

The Dimpled Manifold Model of Adversarial Examples in Machine Learning

Adi Shamir, Odelia Melamed, Oriel BenShmuel

Published: 2021

Conference on empirical methods in natural language processing

Recursive deep models for semantic compositionality over a sentiment treebank

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., Potts, C.

Published: 2013

ICLR

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus

Published: 2014

A Boundary Tilting Perspective on the Phenomenon of Adversarial Examples

Thomas Tanay, Lewis D. Griffin

Published: 2016

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Rethinking Textual Adversarial Defense for Pre-Trained Language Models

Jiayi Wang, Rongzhou Bao, Zhuosheng Zhang, Hai Zhao

Published: 2022

EMNLP

CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation

Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Li, Jilin Chen, Alex Beutel, Ed H. Chi

Published: 2020

UAI

Natural language adversarial defense through synonym encoding

Xiaosen Wang, Hao Jin, Yichen Yang, Kun He

Published: 2021

UAI

Detecting textual adversarial examples through randomized substitution and vote

Xiaosen Wang, Yifeng Xiong, Kun He

Published: 2022

ACL

Unsupervised Out-of-Domain Detection via Pre-trained Transformers

Keyang Xu, Tongzheng Ren, Shikun Zhang, Yihao Feng, Caiming Xiong

Published: 2021

Class-Disentanglement and Applications in Adversarial Detection and Defense

Kaiwen Yang, Tianyi Zhou, Yonggang Zhang, Xinmei Tian, Dacheng Tao

Published: 2021

ACL

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

Mao Ye, Chengyue Gong, Qiang Liu

Published: 2020

Advances in Neural Information Processing Systems

Character-level convolutional networks for text classification

X. Zhang, J. Zhao, Y. LeCun

Published: 2015

Comput. J.

Evaluating Membership Inference Through Adversarial Robustness

Zhaoxi Zhang, Leo Yu Zhang, Xufei Zheng, Bilal Hussain Abbasi, Shengshan Hu

Published: 2022

TrustCom

Self-Supervised Adversarial Example Detection by Disentangled Representation

Zhaoxi Zhang, Leo Yu Zhang, Xufei Zheng, Jinyu Tian, Jiantao Zhou

Published: 2022

Generating Natural Adversarial Examples

Zhengli Zhao, Dheeru Dua, Sameer Singh

Published: 2017

EMNLP-IJCNLP

Learning to Discriminate Perturbations for Blocking Adversarial Attacks in Text Classification

Yichao Zhou, Jyun-Yu Jiang, Kai-Wei Chang, Wei Wang

Published: 2019

ACL

Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble

Yi Zhou, Xiaoqing Zheng, Cho-Jui Hsieh, Kai-Wei Chang, Xuanjing Huang

Published: 2021