Backdoor attacks are a kind of emergent training-time threat to deep neural
networks (DNNs). They can manipulate the output of DNNs and possess high
insidiousness. In the field of natural language processing, some attack methods
have been proposed and achieve very high attack success rates on multiple
popular models. Nevertheless, there are few studies on defending against
textual backdoor attacks. In this paper, we propose a simple and effective
textual backdoor defense named ONION, which is based on outlier word detection
and, to the best of our knowledge, is the first method that can handle all the
textual backdoor attack situations. Experiments demonstrate the effectiveness
of our model in defending BiLSTM and BERT against five different backdoor
attacks. All the code and data of this paper can be obtained at
https://github.com/thunlp/ONION.