AIセキュリティポータル K Program
On Trojan Signatures in Large Language Models of Code
Share
Abstract
Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, and VGG. In this paper, we investigate such signatures in the classifier layer parameters of large language models of source code. Our results suggest that trojan signatures could not generalize to LLMs of code. We found that trojaned code models are stubborn, even when the models were poisoned under more explicit settings (finetuned with pre-trained weights frozen). We analyzed nine trojaned models for two binary classification tasks: clone and defect detection. To the best of our knowledge, this is the first work to examine weight-based trojan signature revelation techniques for large-language models of code and furthermore to demonstrate that detecting trojans only from the weights in such models is a hard problem.
Trojan signatures in DNN weights
Greg Fields, Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar, Tara Javidi
Published: 2021
You autocomplete me: Poisoning vulnerabilities in neural code completion
R. Schuster, C. Song, E. Tromer, V. Shmatikov
Published: 2021
Backdoor learning: A survey
Yiming Li, Yong Jiang, Zhifeng Li, Shu-Tao Xia
Published: 2022
Non-parametric density estimation
Francesco Corona
Published: 2017
On estimation of a probability density function and mode
Emanuel Parzen
Published: 1962
Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, Yang Liu
Published: 2019.9.9
Towards a big data curated benchmark of inter-project code clones
Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy, Mohammad Mamun Mia
Published: 2014
Spectral signatures in backdoor attacks
Brandon Tran, Jerry Li, Aleksander Madry
Published: 2018
You see what I want you to see: poisoning vulnerabilities in neural code search
Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, Lichao Sun
Published: 2022
Stealthy backdoor attack for code models
Zhou Yang, Bowen Xu, Jie M. Zhang, Hong Jin Kang, Jieke Shi, Junda He, David Lo
Published: 2023
Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification
Chuanshuai Chen, Jiazhu Dai
Published: 2020.7.11
Share