Certified Defenses for Data Poisoning Attacks

TOP 文献データベース Certified Defenses for Data Poisoning Attacks

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1706.03691

PDF

https://arxiv.org/pdf/1706.03691

文献情報

作者: Jacob Steinhardt,Pang Wei Koh,Percy Liang
公開日: 2017-6-10
更新日: 2017-11-24
所属機関: Stanford University
所属の国: United States of America
会議名: NIPS

AIにより推定されたラベル

毒データの検知最適化問題ポイズニング

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

外部データセット

MNIST-1-7

Dogfish

IMDB

Enron