LLMセキュリティ

Refusing Safe Prompts for Multi-modal Large Language Models

Authors: Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong | Published: 2024-07-12 | Updated: 2024-09-05
LLMセキュリティ
プロンプトインジェクション
評価手法

CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

Authors: Yuetai Li, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran | Published: 2024-06-18 | Updated: 2025-03-27
LLMセキュリティ
バックドア攻撃
プロンプトインジェクション

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Authors: Fengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha Poovendran | Published: 2024-06-17 | Updated: 2025-01-07
LLMセキュリティ
プロンプトインジェクション
脆弱性管理

Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications

Authors: Stephen Burabari Tete | Published: 2024-06-16
LLMセキュリティ
プロンプトインジェクション
リスク管理

Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models

Authors: Rui Ye, Jingyi Chai, Xiangrui Liu, Yaodong Yang, Yanfeng Wang, Siheng Chen | Published: 2024-06-15
LLMセキュリティ
プロンプトインジェクション
ポイズニング

RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs

Authors: Xuan Chen, Yuzhou Nie, Lu Yan, Yunshu Mao, Wenbo Guo, Xiangyu Zhang | Published: 2024-06-13
LLMセキュリティ
プロンプトインジェクション
強化学習

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Authors: Edoardo Debenedetti, Javier Rando, Daniel Paleka, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr | Published: 2024-06-12
LLMセキュリティ
プロンプトインジェクション
防御手法

A Study of Backdoors in Instruction Fine-tuned Language Models

Authors: Jayaram Raghuram, George Kesidis, David J. Miller | Published: 2024-06-12 | Updated: 2024-08-21
LLMセキュリティ
バックドア攻撃
防御手法

Knowledge Return Oriented Prompting (KROP)

Authors: Jason Martin, Kenneth Yeung | Published: 2024-06-11
LLMセキュリティ
プロンプトインジェクション
攻撃手法

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models

Authors: Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Xiaoyu Xu, Xiaobao Wu, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan | Published: 2024-06-10 | Updated: 2025-01-04
LLMセキュリティ
バックドア攻撃