Unlearning Backdoor Attacks through Gradient-Based Model Pruning

TOP 文献データベース Unlearning Backdoor Attacks through Gradient-Based Model Pruning

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2405.03918

PDF

https://arxiv.org/pdf/2405.03918

文献情報

作者: Kealan Dunnett;Reza Arablouei;Dimity Miller;Volkan Dedeoglu;Raja Jurdak
公開日: 2024-5-7
所属機関: Queensland University of Technology
所属の国: Australia
会議名: DSN-W

AIにより推定されたラベル

バックドア攻撃モデル性能評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial amounts of data for effective mitigation, posing significant challenges in practical deployment. To address this, we propose a novel approach to counter backdoor attacks by treating their mitigation as an unlearning task. We tackle this challenge through a targeted model pruning strategy, leveraging unlearning loss gradients to identify and eliminate backdoor elements within the model. Built on solid theoretical insights, our approach offers simplicity and effectiveness, rendering it well-suited for scenarios with limited data availability. Our methodology includes formulating a suitable unlearning loss and devising a model-pruning technique tailored for convolutional neural networks. Comprehensive evaluations demonstrate the efficacy of our proposed approach compared to state-of-the-art approaches, particularly in realistic data settings.

外部データセット

CIFAR-10

German Traffic Sign Recognition Benchmark (GTSRB)