Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing Authors: Johan Wahréus, Ahmed Hussain, Panos Papadimitratos | Published: 2025-03-27 System DevelopmentPrompt InjectionLarge Language Model 2025.03.27 2025.05.12 Literature Database
Defeating Prompt Injections by Design Authors: Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr | Published: 2025-03-24 Indirect Prompt InjectionPrompt Injection 2025.03.24 2025.05.12 Literature Database
Large Language Models powered Network Attack Detection: Architecture, Opportunities and Case Study Authors: Xinggong Zhang, Qingyang Li, Yunpeng Tan, Zongming Guo, Lei Zhang, Yong Cui | Published: 2025-03-24 Prompt InjectionPrompt leakingIntrusion Detection System 2025.03.24 2025.05.12 Literature Database
Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection Authors: Fei Zuo, Junghwan Rhee, Yung Ryn Choe | Published: 2025-03-24 Cyber Threat IntelligencePrompt InjectionInformation Extraction 2025.03.24 2025.05.12 Literature Database
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models Authors: Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang | Published: 2025-03-23 Prompt InjectionMalicious PromptEffectiveness Analysis of Defense Methods 2025.03.23 2025.05.12 Literature Database
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models Authors: Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun | Published: 2025-03-20 Backdoor AttackPrompt InjectionLarge Language Model 2025.03.20 2025.05.12 Literature Database
Detecting LLM-Written Peer Reviews Authors: Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, Nihar B. Shah | Published: 2025-03-20 Prompt InjectionDigital Watermarking for Generative AIWatermark Design 2025.03.20 2025.05.12 Literature Database
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings Authors: Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao | Published: 2025-03-19 Prompt InjectionLarge Language ModelAttack Method 2025.03.19 2025.05.12 Literature Database
Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models Authors: Prashant Kulkarni, Assaf Namer | Published: 2025-03-18 Prompt InjectionPrompt leakingAttack Method 2025.03.18 2025.05.12 Literature Database
MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting Authors: Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang | Published: 2025-03-17 Prompt InjectionLarge Language ModelAttack Method 2025.03.17 2025.05.12 Literature Database