We present a probabilistic framework for studying adversarial attacks on
discrete data. Based on this framework, we derive a perturbation-based method,
Greedy Attack, and a scalable learning-based method, Gumbel Attack, that
illustrate various tradeoffs in the design of attacks. We demonstrate the
effectiveness of these methods using both quantitative metrics and human
evaluation on various state-of-the-art models for text classification,
including a word-based CNN, a character-based CNN and an LSTM. As as example of
our results, we show that the accuracy of character-based convolutional
networks drops to the level of random selection by modifying only five
characters through Greedy Attack.