Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

TOP 文献データベース Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2007.12070

PDF

https://arxiv.org/pdf/2007.12070

文献情報

作者: Chuanshuai Chen;Jiazhu Dai
公開日: 2020-7-11
更新日: 2021-3-15
所属機関: School of Computer Engineering and Science, Shanghai University
所属の国: China
会議名: Neurocomputing

AIにより推定されたラベル

ポイズニングバックドア攻撃テキスト生成手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning. This method can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset. We evaluate our method on four different text classification datset: IMDB, DBpedia ontology, 20 newsgroups and Reuters-21578 dataset. It all achieves good performance regardless of the trigger sentences.