Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing Authors: Johan Wahréus, Ahmed Hussain, Panos Papadimitratos | Published: 2025-03-27 System DevelopmentPrompt InjectionLarge Language Model 2025.03.27 2025.05.27 Literature Database
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun | Published: 2025-03-20 Backdoor AttackPrompt InjectionLarge Language Model 2025.03.20 2025.05.27 Literature Database
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings Authors: Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao | Published: 2025-03-19 Prompt InjectionLarge Language ModelAttack Method 2025.03.19 2025.05.27 Literature Database
MirrorShield: Towards Universal Defense Against Jailbreaks via Entropy-Guided Mirror Crafting Authors: Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang | Published: 2025-03-17 | Updated: 2025-05-20 Prompt InjectionLarge Language ModelAttack Method 2025.03.17 2025.05.27 Literature Database
Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application Authors: Wenzhuo Xu, Zhipeng Wei, Xiongtao Sun, Zonghao Ying, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Quanchen Zou | Published: 2025-03-10 | Updated: 2025-07-31 Prompt InjectionLarge Language ModelRobustness of Watermarking Techniques 2025.03.10 2025.08.02 Literature Database
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos Authors: Yang Yao, Xuan Tong, Ruofan Wang, Yixu Wang, Lujundong Li, Liang Liu, Yan Teng, Yingchun Wang | Published: 2025-02-19 | Updated: 2025-06-03 Disabling Safety Mechanisms of LLMEthical ConsiderationsLarge Language Model 2025.02.19 2025.06.05 Literature Database
“Short-length” Adversarial Training Helps LLMs Defend “Long-length” Jailbreak Attacks: Theoretical and Empirical Evidence Authors: Shaopeng Fu, Liang Ding, Di Wang | Published: 2025-02-06 Prompt InjectionLarge Language ModelAdversarial Training 2025.02.06 2025.05.27 Literature Database
LLM Safety Alignment is Divergence Estimation in Disguise Authors: Rajdeep Haldar, Ziyi Wang, Qifan Song, Guang Lin, Yue Xing | Published: 2025-02-02 Prompt InjectionConvergence AnalysisLarge Language ModelSafety Alignment 2025.02.02 2025.05.27 Literature Database
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy Authors: Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li | Published: 2025-01-16 Survey PaperPrivacy ProtectionPrompt InjectionLarge Language Model 2025.01.16 2025.05.27 Literature Database
Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack Authors: Sagiv Antebi, Edan Habler, Asaf Shabtai, Yuval Elovici | Published: 2025-01-14 CybersecurityPrivacy ProtectionLarge Language Model 2025.01.14 2025.05.27 Literature Database