A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy Authors: Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li | Published: 2025-01-16 Survey PaperPrivacy ProtectionPrompt InjectionLarge Language Model 2025.01.16 2025.05.27 Literature Database
Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack Authors: Sagiv Antebi, Edan Habler, Asaf Shabtai, Yuval Elovici | Published: 2025-01-14 CybersecurityPrivacy ProtectionLarge Language Model 2025.01.14 2025.05.27 Literature Database
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards Authors: Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, Ion Stoica, Florian Tramer, Chiyuan Zhang | Published: 2025-01-13 CybersecurityLarge Language ModelAttack Evaluation 2025.01.13 2025.05.27 Literature Database
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage Authors: Xiaoning Dong, Wenbo Hu, Wei Xu, Tianxing He | Published: 2024-12-19 | Updated: 2025-03-21 Prompt InjectionLarge Language ModelAdversarial Learning 2024.12.19 2025.05.27 Literature Database
Towards Action Hijacking of Large Language Model-based Agent Authors: Yuyang Zhang, Kangjie Chen, Jiaxin Gao, Ronghao Cui, Run Wang, Lina Wang, Tianwei Zhang | Published: 2024-12-14 | Updated: 2025-06-12 Performance EvaluationPrompt leakingLarge Language Model 2024.12.14 2025.06.14 Literature Database
“Moralized” Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks Authors: Libo Wang | Published: 2024-11-23 | Updated: 2025-03-20 Prompt InjectionLarge Language Model 2024.11.23 2025.05.27 Literature Database
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit Authors: Zeqing He, Zhibo Wang, Zhixuan Chu, Huiyu Xu, Wenhui Zhang, Qinglong Wang, Rui Zheng | Published: 2024-11-17 | Updated: 2025-04-24 Disabling Safety Mechanisms of LLMPrompt InjectionLarge Language Model 2024.11.17 2025.05.27 Literature Database
Attention Tracker: Detecting Prompt Injection Attacks in LLMs Authors: Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen | Published: 2024-11-01 | Updated: 2025-04-23 Indirect Prompt InjectionLarge Language ModelAttention Mechanism 2024.11.01 2025.05.27 Literature Database
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning Authors: Arshiya Khan, Guannan Liu, Xing Gao | Published: 2024-09-27 | Updated: 2025-06-11 コード脆弱性修復セキュリティコンテキスト統合Large Language Model 2024.09.27 2025.06.13 Literature Database
Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles Authors: Zhilong Wang, Haizhou Wang, Nanqing Luo, Lan Zhang, Xiaoyan Sun, Yebo Cao, Peng Liu | Published: 2024-08-20 | Updated: 2025-02-07 Prompt InjectionLarge Language ModelAttack Scenario Analysis 2024.08.20 2025.05.27 Literature Database