De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks

TOP 文献データベース De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2105.03592

PDF

https://arxiv.org/pdf/2105.03592

文献情報

作者: Jian Chen;Xuxin Zhang;Rui Zhang;Chen Wang;Ling Liu
公開日: 2021-5-8
所属機関: Internet Technology and Engineering R&D Center (ITEC), School of Electronic Information and Communications, Huazhong University of Science and Technology
所属の国: China
会議名

AIにより推定されたラベル

ポイズニング毒性攻撃に特化した内容生成モデルの課題

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Machine learning techniques have been widely applied to various applications. However, they are potentially vulnerable to data poisoning attacks, where sophisticated attackers can disrupt the learning procedure by injecting a fraction of malicious samples into the training dataset. Existing defense techniques against poisoning attacks are largely attack-specific: they are designed for one specific type of attacks but do not work for other types, mainly due to the distinct principles they follow. Yet few general defense strategies have been developed. In this paper, we propose De-Pois, an attack-agnostic defense against poisoning attacks. The key idea of De-Pois is to train a mimic model the purpose of which is to imitate the behavior of the target model trained by clean samples. We take advantage of Generative Adversarial Networks (GANs) to facilitate informative training data augmentation as well as the mimic model construction. By comparing the prediction differences between the mimic model and the target model, De-Pois is thus able to distinguish the poisoned samples from clean ones, without explicit knowledge of any ML algorithms or types of poisoning attacks. We implement four types of poisoning attacks and evaluate De-Pois with five typical defense methods on different realistic datasets. The results demonstrate that De-Pois is effective and efficient for detecting poisoned data against all the four types of poisoning attacks, with both the accuracy and F1-score over 0.9 on average.