Bypassing Backdoor Detection Algorithms in Deep Learning

TOP 文献データベース Bypassing Backdoor Detection Algorithms in Deep Learning

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1905.13409

PDF

https://arxiv.org/pdf/1905.13409

文献情報

作者: Te Juin Lester Tan,Reza Shokri
公開日: 2019-5-31
更新日: 2020-6-7
所属機関: Department of Computer Science
所属の国: Singapore
会議名: European Symposium on Security and Privacy (EuroS&P)

AIにより推定されたラベル

メンバーシップ推論敵対的攻撃手法プルーニング手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model, so the model behaves according to the adversary's objective if the input contains the backdoor features, referred to as the backdoor trigger (e.g., a stamp on an image). The poisoned model's behavior on clean data, however, remains unchanged. Many detection algorithms are designed to detect backdoors on input samples or model parameters, through the statistical difference between the latent representations of adversarial and clean input samples in the poisoned model. In this paper, we design an adversarial backdoor embedding algorithm that can bypass the existing detection algorithms including the state-of-the-art techniques. We design an adaptive adversarial training algorithm that optimizes the original loss function of the model, and also maximizes the indistinguishability of the hidden representations of poisoned data and clean data. This work calls for designing adversary-aware defense mechanisms for backdoor detection.

外部データセット

CIFAR-10

GTSRB