Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14 Disabling Safety Mechanisms of LLMPrompt InjectionMalicious Prompt 2025.03.14 2025.05.27 Literature Database
CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning Authors: Adel ElZemity, Budi Arief, Shujun Li | Published: 2025-03-12 | Updated: 2025-09-17 Disabling Safety Mechanisms of LLMSecurity AnalysisPrompt Injection 2025.03.12 2025.09.19 Literature Database
Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application Authors: Wenzhuo Xu, Zhipeng Wei, Xiongtao Sun, Zonghao Ying, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Quanchen Zou | Published: 2025-03-10 | Updated: 2025-07-31 Prompt InjectionLarge Language ModelRobustness of Watermarking Techniques 2025.03.10 2025.08.02 Literature Database
Improving LLM Safety Alignment with Dual-Objective Optimization Authors: Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song | Published: 2025-03-05 | Updated: 2025-06-12 Prompt InjectionRobustness Improvement MethodTrade-Off Between Safety And Usability 2025.03.05 2025.06.14 Literature Database
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks Authors: Hanjiang Hu, Alexander Robey, Changliu Liu | Published: 2025-02-28 | Updated: 2025-08-25 Backdoor AttackPrompt InjectionWatermark 2025.02.28 2025.08.27 Literature Database
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs Authors: Shiyu Xiang, Ansen Zhang, Yanfei Cao, Yang Fan, Ronghao Chen | Published: 2025-02-26 | Updated: 2025-05-28 LLM SecurityPrompt InjectionAttack Evaluation 2025.02.26 2025.05.30 Literature Database
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-wild LLM Jailbreak Methods Authors: Ruixuan Huang, Xunguang Wang, Zongjie Li, Daoyuan Wu, Shuai Wang | Published: 2025-02-24 | Updated: 2025-07-09 Prompt Injection脱獄手法Evaluation Method 2025.02.24 2025.07.11 Literature Database
Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System Authors: Saikat Barua, Mostafizur Rahman, Md Jafor Sadek, Rafiul Islam, Shehenaz Khaled, Ahmedul Kabir | Published: 2025-02-23 | Updated: 2025-06-12 Prompt Injection多エージェントシステムの評価Adversarial Attack Assessment 2025.02.23 2025.06.14 Literature Database
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings Authors: Weikai Lu, Hao Peng, Huiping Zhuang, Cen Chen, Ziqian Zeng | Published: 2025-02-18 | Updated: 2025-05-21 AlignmentText Generation MethodPrompt Injection 2025.02.18 2025.05.28 Literature Database
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing Authors: Yi Wang, Fenghua Weng, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang | Published: 2025-02-17 | Updated: 2025-05-29 LLM SecurityPrompt InjectionDefense Method 2025.02.17 2025.05.31 Literature Database