These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Parameter-efficient fine-tuning (PEFT) has become a key training strategy for
large language models. However, its reliance on fewer trainable parameters
poses security risks, such as task-agnostic backdoors. Despite their severe
impact on a wide range of tasks, there is no practical defense solution
available that effectively counters task-agnostic backdoors within the context
of PEFT. In this study, we introduce Obliviate, a PEFT-integrable backdoor
defense. We develop two techniques aimed at amplifying benign neurons within
PEFT layers and penalizing the influence of trigger tokens. Our evaluations
across three major PEFT architectures show that our method can significantly
reduce the attack success rate of the state-of-the-art task-agnostic backdoors
(83.6%$\downarrow$). Furthermore, our method exhibits robust defense
capabilities against both task-specific backdoors and adaptive attacks. Source
code will be obtained at https://github.com/obliviateARR/Obliviate.
External Datasets
SST-2
AG News
Hate Speech and Offensive Language
References
Empirical Methods in Natural Language Processing
A large annotated corpus for learning natural language inference
Bowman, S. R., Angeli, G., Potts, C., Manning, C. D.
Published: 2015
International Conference on Learning Representations
Badpre: Task-agnostic backdoor attacks to pre-trained nlp foundation models
Kangjie Chen, Yuxian Meng, Xiaofei Sun, Shangwei Guo, Tianwei Zhang, Jiwei Li, Chun Fan
Published: 2021
Proceedings of the 37th Annual Computer Security Applications Conference
Badnl: Backdoor attacks against nlp models with semantic-preserving improvements
Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, Yang Zhang
Published: 2021
IEEE Access
A backdoor attack against lstm-based text classification systems
Jiazhu Dai, Chuanshuai Chen, Yufeng Li
Published: 2019
Proceedings of the international AAAI conference on web and social media
Automated hate speech detection and the problem of offensive language
Thomas Davidson, Dana Warmsley, Michael Macy, Ingmar Weber
Published: 2017
Proceedings of NAACL-HLT
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Hidden killer: Invisible textual backdoor attacks with syntactic trigger