Better Privilege Separation for Agents by Restricting Data Types Authors: Dennis Jacob, Emad Alghamdi, Zhanhao Hu, Basel Alomair, David Wagner | Published: 2025-09-30 Indirect Prompt InjectionSecurity Strategy GenerationMalicious Prompt 2025.09.30 2025.10.02 Literature Database
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety Authors: Taegyeong Lee, Jeonghwa Yoo, Hyoungseo Cho, Soo Yong Kim, Yunho Maeng | Published: 2025-06-14 | Updated: 2025-09-30 AlignmentEthical StatementMalicious Prompt 2025.06.14 2025.10.02 Literature Database
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models Authors: Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang | Published: 2025-03-23 Prompt InjectionMalicious PromptEffectiveness Analysis of Defense Methods 2025.03.23 2025.05.27 Literature Database
Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14 Disabling Safety Mechanisms of LLMPrompt InjectionMalicious Prompt 2025.03.14 2025.05.27 Literature Database
Can Indirect Prompt Injection Attacks Be Detected and Removed? Authors: Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, Bryan Hooi | Published: 2025-02-23 Prompt validationMalicious PromptAttack Method 2025.02.23 2025.05.27 Literature Database
Dagger Behind Smile: Fool LLMs with a Happy Ending Story Authors: Xurui Song, Zhixin Xie, Shuo Huai, Jiayi Kong, Jun Luo | Published: 2025-01-19 | Updated: 2025-09-30 Disabling Safety Mechanisms of LLMMalicious Prompt攻撃手法の効果 2025.01.19 2025.10.02 Literature Database
Toxicity Detection for Free Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner | Published: 2024-05-29 | Updated: 2024-11-08 Indirect Prompt InjectionPrompt validationMalicious Prompt 2024.05.29 2025.05.27 Literature Database
Defending Against Indirect Prompt Injection Attacks With Spotlighting Authors: Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman | Published: 2024-03-20 Indirect Prompt InjectionPrompt InjectionMalicious Prompt 2024.03.20 2025.05.27 Literature Database
Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models Authors: Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang | Published: 2024-02-05 | Updated: 2024-10-07 Privacy ProtectionPrompt InjectionMalicious Prompt 2024.02.05 2025.05.27 Literature Database
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models Authors: Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu | Published: 2023-12-21 | Updated: 2025-01-27 Indirect Prompt InjectionMalicious PromptVulnerability Analysis 2023.12.21 2025.05.27 Literature Database