MISA: Online Defense of Trojaned Models using Misattributions

TOP 文献データベース MISA: Online Defense of Trojaned Models using Misattributions

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2103.15918

PDF

https://arxiv.org/pdf/2103.15918

文献情報

作者: Panagiota Kiourti;Wenchao Li;Anirban Roy;Karan Sikka;Susmit Jha
公開日: 2021-3-30
更新日: 2021-9-24
所属機関: Boston University
所属の国: United States of America
会議名: Annual Computer Security Applications Conference (ACSAC)

AIにより推定されたラベル

異常検出手法脅威モデリング敵対的サンプルの脆弱性

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Recent studies have shown that neural networks are vulnerable to Trojan attacks, where a network is trained to respond to specially crafted trigger patterns in the inputs in specific and potentially malicious ways. This paper proposes MISA, a new online approach to detect Trojan triggers for neural networks at inference time. Our approach is based on a novel notion called misattributions, which captures the anomalous manifestation of a Trojan activation in the feature space. Given an input image and the corresponding output prediction, our algorithm first computes the model's attribution on different features. It then statistically analyzes these attributions to ascertain the presence of a Trojan trigger. Across a set of benchmarks, we show that our method can effectively detect Trojan triggers for a wide variety of trigger patterns, including several recent ones for which there are no known defenses. Our method achieves 96% AUC for detecting images that include a Trojan trigger without any assumptions on the trigger pattern.