AIセキュリティポータル K Program
Unlearning Backdoor Attacks through Gradient-Based Model Pruning
Share
Abstract
In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial amounts of data for effective mitigation, posing significant challenges in practical deployment. To address this, we propose a novel approach to counter backdoor attacks by treating their mitigation as an unlearning task. We tackle this challenge through a targeted model pruning strategy, leveraging unlearning loss gradients to identify and eliminate backdoor elements within the model. Built on solid theoretical insights, our approach offers simplicity and effectiveness, rendering it well-suited for scenarios with limited data availability. Our methodology includes formulating a suitable unlearning loss and devising a model-pruning technique tailored for convolutional neural networks. Comprehensive evaluations demonstrate the efficacy of our proposed approach compared to state-of-the-art approaches, particularly in realistic data settings.
A new backdoor attack in CNNs by training set corruption without label poisoning
Mauro Barni, Kassem Kallas, Benedetta Tondi
Published: 2019
Effective backdoor defense by exploiting sensitivity of poisoned samples
W. Chen, B. Wu, H. Wang
Published: 2022
Lira: Learnable, imperceptible and robust backdoor attacks
K. Doan, Y. Lao, W. Zhao, P. Li
Published: 2021
BadNets: Evaluating backdooring attacks on deep neural networks
Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, Siddharth Garg
Published: 2019
Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark
S. Houben, J. Stallkamp, J. Salmen, M. Schlipsing, C. Igel
Published: 2013
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton
Published: 2009
Reconstructive neuron pruning for backdoor defense
Li, Y., Lyu, X., Ma, X., Koren, N., Lyu, L., Li, B., Jiang, Y.-G.
Published: 2023
Invisible backdoor attack with sample-specific triggers
Y. Li, Y. Li, B. Wu, L. Li, R. He, S. Lyu
Published: 2021
Backdoor defense with machine unlearning
Y. Liu, M. Fan, C. Chen, X. Liu, Z. Ma, L. Wang, J. Ma
Published: 2022
Abs: Scanning neural networks for back-doors by artificial brain stimulation
Liu, Y., Lee, W.-C., Tao, G., Ma, S., Aafer, Y., Zhang, X.
Published: 2019
A survey on deep learning: Algorithms, techniques, and applications
S. Pouyanfar, S. Sadiq, Y. Yan, H. Tian, Y. Tao, M.P. Reyes, M.L. Shyu, S.C. Chen, S.S. Iyengar
Published: 2018
Single image backdoor inversion via robust smoothed classifiers
M. Sun, Z. Kolter
Published: 2023
Neural cleanse: Identifying and mitigating backdoor attacks in neural networks
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, Ben Y Zhao
Published: 2019
Bppattack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning
Z. Wang, J. Zhai, S. Ma
Published: 2022
Backdoorbench: A comprehensive benchmark of backdoor learning
B. Wu, H. Chen, M. Zhang, Z. Zhu, S. Wei, D. Yuan, C. Shen
Published: 2022
Adversarial neuron pruning purifies backdoored deep models
Dongxian Wu, Yisen Wang
Published: 2021
Machine learning security: Threats, countermeasures, and evaluations
M. Xue, C. Yuan, H. Wu, Y. Zhang, W. Liu
Published: 2020
Rethinking the backdoor attacks’ triggers: A frequency perspective
Y. Zeng, W. Park, Z. M. Mao, R. Jia
Published: 2021
Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness
Pu Zhao, Pin-Yu Chen, Payel Das, Karthikeyan Natesan Ramamurthy, Xue Lin
Published: 5.1.2020
Data-free backdoor removal based on channel lipschitzness
R. Zheng, R. Tang, J. Li, L. Liu
Published: 2022
Share