On Trojan Signatures in Large Language Models of Code | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース On Trojan Signatures in Large Language Models of Code

arxiv

On Trojan Signatures in Large Language Models of Code

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2402.16896

PDF

https://arxiv.org/pdf/2402.16896

文献情報

作者: Aftab Hussain;Md Rafiqul Islam Rabin;Mohammad Amin Alipour
公開日: 2024-2-24
更新日: 2024-3-8
所属機関: University of Houston
所属の国: United States of America
会議名

AIにより推定されたラベル

トロイの木馬検出トロイの木馬の署名 LLMセキュリティ

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, and VGG. In this paper, we investigate such signatures in the classifier layer parameters of large language models of source code. Our results suggest that trojan signatures could not generalize to LLMs of code. We found that trojaned code models are stubborn, even when the models were poisoned under more explicit settings (finetuned with pre-trained weights frozen). We analyzed nine trojaned models for two binary classification tasks: clone and defect detection. To the best of our knowledge, this is the first work to examine weight-based trojan signature revelation techniques for large-language models of code and furthermore to demonstrate that detecting trojans only from the weights in such models is a hard problem.

外部データセット

TrojAI dataset

Devign C/C++

BigCloneBench Java

参考文献

Proceedings of the IEEE/CVF International Conference on Computer Vision

Trojan signatures in DNN weights

Greg Fields, Mohammad Samragh, Mojan Javaheripi, Farinaz Koushanfar, Tara Javidi

Published: 2021

BenchCouncil Transactions on Benchmarks, Standards and Evaluations

An era of chatgpt as a significant futuristic support tool: A study on features, abilities, and challenges

Abid Haleem, Mohd Javaid, Ravi Pratap Singh

Published: 2022

Github copilot is now public

Published: 2022

USENIX Security

You autocomplete me: Poisoning vulnerabilities in neural code completion

R. Schuster, C. Song, E. Tromer, V. Shmatikov

Published: 2021

A survey of trojans in neural models of source code: Taxonomy and techniques

Aftab Hussain, Md. Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Prem Devanbu, Mohammad Amin Alipour

Published: 2023

IEEE Transactions on Neural Networks and Learning Systems

Backdoor learning: A survey

Yiming Li, Yong Jiang, Zhifeng Li, Shu-Tao Xia

Published: 2022

Non-parametric density estimation

Francesco Corona

Published: 2017

The Annals of Mathematical Statistics

On estimation of a probability density function and mode

Emanuel Parzen

Published: 1962

TrojanedCM: A repository for poisoned neural models of source code

Aftab Hussain, Md. Rafiqul Islam Rabin, Mohammad Amin Alipour

Published: 2023

Conference on Neural Information Processing Systems (NeurIPS)

Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, Yang Liu

Published: 2019.9.9

Vulnerability identification is crucial to protect the software systems from attacks for cyber security. It is especially important to localize the vulnerable functions among the source code to facilitate the fix. However, it is a challenging and tedious process, and also requires specialized security expertise. Inspired by the work on manually-defined patterns of vulnerabilities from various code representation graphs and the recent advance on graph neural networks, we propose Devign, a general graph neural network based model for graph-level classification through learning on a rich set of code semantic representations. It includes a novel Conv module to efficiently extract useful features in the learned rich node representations for graph-level classification. The model is trained over manually labeled datasets built on 4 diversified large-scale open-source C projects that incorporate high complexity and variety of real source code instead of synthesis code used in previous works. The results of the extensive evaluation on the datasets demonstrate that Devign outperforms the state of the arts significantly with an average of 10.51% higher accuracy and 8.68\% F1 score, increases averagely 4.66% accuracy and 6.37% F1 by the Conv module.

データ駆動型脆弱性評価グラフ構築機械学習

Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution

Towards a big data curated benchmark of inter-project code clones

Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy, Mohammad Mamun Mia

Published: 2014

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems (NeurIPS)

Spectral signatures in backdoor attacks

Brandon Tran, Jerry Li, Aleksander Madry

Published: 2018

International Conference on Pattern Recognition (ICPR)

Backdoors in Neural Models of Source Code

Goutham Ramakrishnan, Aws Albarghouthi

Published: 2020.6.12

Deep neural networks are vulnerable to a range of adversaries. A particularly pernicious class of vulnerabilities are backdoors, where model predictions diverge in the presence of subtle triggers in inputs. An attacker can implant a backdoor by poisoning the training data to yield a desired target prediction on triggered inputs. We study backdoors in the context of deep-learning for source code. (1) We define a range of backdoor classes for source-code tasks and show how to poison a dataset to install such backdoors. (2) We adapt and improve recent algorithms from robust statistics for our setting, showing that backdoors leave a spectral signature in the learned representation of source code, thus enabling detection of poisoned data. (3) We conduct a thorough evaluation on different architectures and languages, showing the ease of injecting backdoors and our ability to eliminate them.

バックドア攻撃ポイズニングプログラム解析

Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

You see what I want you to see: poisoning vulnerabilities in neural code search

Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, Lichao Sun

Published: 2022

Stealthy backdoor attack for code models

Zhou Yang, Bowen Xu, Jie M. Zhang, Hong Jin Kang, Jieke Shi, Junda He, David Lo

Published: 2023

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Backdooring neural code search

Weisong Sun, Yuchen Chen, Guanhong Tao, Chunrong Fang, Xiangyu Zhang, Quanjun Zhang, Bin Luo

Published: 2023

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

Chuanshuai Chen, Jiazhu Dai

Published: 2020.7.11

It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning. This method can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset. We evaluate our method on four different text classification datset: IMDB, DBpedia ontology, 20 newsgroups and Reuters-21578 dataset. It all achieves good performance regardless of the trigger sentences.

ポイズニングバックドア攻撃テキスト生成手法

Badcs: A backdoor attack framework for code search

Shiyi Qi, Yuanhang Yang, Shuzhzeng Gao, Cuiyun Gao, Zenglin Xu

Published: 2023

Conference on Empirical Methods in Natural Language Processing (EMNLP)

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Fanchao Qi, Yangyi Chen, Mukai Li, Yuan Yao, Zhiyuan Liu, Maosong Sun

Published: 2020.11.20

Backdoor attacks are a kind of emergent training-time threat to deep neural networks (DNNs). They can manipulate the output of DNNs and possess high insidiousness. In the field of natural language processing, some attack methods have been proposed and achieve very high attack success rates on multiple popular models. Nevertheless, there are few studies on defending against textual backdoor attacks. In this paper, we propose a simple and effective textual backdoor defense named ONION, which is based on outlier word detection and, to the best of our knowledge, is the first method that can handle all the textual backdoor attack situations. Experiments demonstrate the effectiveness of our model in defending BiLSTM and BERT against five different backdoor attacks. All the code and data of this paper can be obtained at https://github.com/thunlp/ONION.

テキストの摂動手法トリガーの検知バックドアモデルの検知

Occlusion-based detection of trojan-triggering inputs in large language models of code

Aftab Hussain, Md. Rafiqul Islam Rabin, Toufique Ahmed, Mohammad Amin Alipour, Bowen Xu

Published: 2023