STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models Authors: Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang | Published: 2025-03-23 Prompt InjectionMalicious PromptEffectiveness Analysis of Defense Methods 2025.03.23 2025.05.27 Literature Database
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun | Published: 2025-03-20 Backdoor AttackPrompt InjectionLarge Language Model 2025.03.20 2025.05.27 Literature Database
Detecting LLM-Generated Peer Reviews Authors: Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, Nihar B. Shah | Published: 2025-03-20 | Updated: 2025-05-19 Prompt InjectionDigital Watermarking for Generative AIWatermark Design 2025.03.20 2025.05.27 Literature Database
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings Authors: Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao | Published: 2025-03-19 Prompt InjectionLarge Language ModelAttack Method 2025.03.19 2025.05.27 Literature Database
Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models Authors: Prashant Kulkarni, Assaf Namer | Published: 2025-03-18 Prompt InjectionPrompt leakingAttack Method 2025.03.18 2025.05.27 Literature Database
MirrorShield: Towards Universal Defense Against Jailbreaks via Entropy-Guided Mirror Crafting Authors: Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang | Published: 2025-03-17 | Updated: 2025-05-20 Prompt InjectionLarge Language ModelAttack Method 2025.03.17 2025.05.27 Literature Database
Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14 Disabling Safety Mechanisms of LLMPrompt InjectionMalicious Prompt 2025.03.14 2025.05.27 Literature Database
Improving LLM Safety Alignment with Dual-Objective Optimization Authors: Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song | Published: 2025-03-05 | Updated: 2025-06-12 Prompt InjectionRobustness Improvement MethodTrade-Off Between Safety And Usability 2025.03.05 2025.06.14 Literature Database
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs Authors: Shiyu Xiang, Ansen Zhang, Yanfei Cao, Yang Fan, Ronghao Chen | Published: 2025-02-26 | Updated: 2025-05-28 LLM SecurityPrompt InjectionAttack Evaluation 2025.02.26 2025.05.30 Literature Database
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods Authors: Ruixuan Huang, Xunguang Wang, Zongjie Li, Daoyuan Wu, Shuai Wang | Published: 2025-02-24 | Updated: 2025-07-09 Prompt Injection脱獄手法Evaluation Method 2025.02.24 2025.07.11 Literature Database