SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

TOP 文献データベース SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2104.11315

PDF

https://arxiv.org/pdf/2104.11315

文献情報

作者: Jonathan Hayase;Weihao Kong;Raghav Somani;Sewoong Oh
公開日: 2021-4-23
所属機関: Paul G. Allen School of Computer Science & Engineering, University of Washington
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

ポイズニング攻撃ポイズニングバックドア攻撃

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trained model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples. Code and pre-trained models are available at https://github.com/SewoongLab/spectre-defense .