Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography Authors: Songze Li, Jiameng Cheng, Yiming Li, Xiaojun Jia, Dacheng Tao | Published: 2025-12-23 Disabling Safety Mechanisms of LLMPrompt Injectionマルチモーダル安全性 2025.12.23 2025.12.25 Literature Database
Can LLMs Make (Personalized) Access Control Decisions? Authors: Friederike Groschupp, Daniele Lain, Aritra Dhar, Lara Magdalena Lazier, Srdjan Čapkun | Published: 2025-11-25 Disabling Safety Mechanisms of LLMPrivacy AssessmentPrompt Injection 2025.11.25 2025.11.27 Literature Database
Understanding and Mitigating Over-refusal for Large Language Models via Safety Representation Authors: Junbo Zhang, Ran Chen, Qianli Zhou, Xinyang Deng, Wen Jiang | Published: 2025-11-24 Disabling Safety Mechanisms of LLMPrompt InjectionMalicious Prompt 2025.11.24 2025.11.26 Literature Database
Black-Box Guardrail Reverse-engineering Attack Authors: Hongwei Yao, Yun Xia, Shuo Shao, Haoran Shi, Tong Qiao, Cong Wang | Published: 2025-11-06 Disabling Safety Mechanisms of LLMPrompt leakingInformation Security 2025.11.06 2025.11.08 Literature Database
Death by a Thousand Prompts: Open Model Vulnerability Analysis Authors: Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda | Published: 2025-11-05 Disabling Safety Mechanisms of LLMIndirect Prompt InjectionThreat modeling 2025.11.05 2025.11.07 Literature Database
Multimodal Safety Is Asymmetric: Cross-Modal Exploits Unlock Black-Box MLLMs Jailbreaks Authors: Xinkai Wang, Beibei Li, Zerui Shao, Ao Liu, Shouling Ji | Published: 2025-10-20 Disabling Safety Mechanisms of LLMPrompt InjectionMalicious Content Generation 2025.10.20 2025.10.22 Literature Database
Proactive defense against LLM Jailbreak Authors: Weiliang Zhao, Jinjun Peng, Daniel Ben-Levi, Zhou Yu, Junfeng Yang | Published: 2025-10-06 Disabling Safety Mechanisms of LLMPrompt Injection防御手法の統合 2025.10.06 2025.10.08 Literature Database
LLM Watermark Evasion via Bias Inversion Authors: Jeongyeon Hwang, Sangdon Park, Jungseul Ok | Published: 2025-09-27 | Updated: 2025-10-01 Disabling Safety Mechanisms of LLMModel InversionStatistical Testing 2025.09.27 2025.10.03 Literature Database
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models Authors: Miao Yu, Zhenhong Zhou, Moayad Aloqaily, Kun Wang, Biwei Huang, Stephen Wang, Yueming Jin, Qingsong Wen | Published: 2025-09-26 | Updated: 2025-09-30 Disabling Safety Mechanisms of LLMSelf-Attention MechanismInterpretability 2025.09.26 2025.10.02 Literature Database
RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks Authors: Hanbo Huang, Yiran Zhang, Hao Zheng, Xuan Gong, Yihan Li, Lin Liu, Shiyu Liang | Published: 2025-09-25 Disabling Safety Mechanisms of LLMPrompt InjectionWatermark Design 2025.09.25 2025.09.27 Literature Database