Prompt Injection

Detecting LLM-Generated Peer Reviews

Authors: Vishisht Rao, Aounon Kumar, Himabindu Lakkaraju, Nihar B. Shah | Published: 2025-03-20 | Updated: 2025-05-19
Prompt Injection
Digital Watermarking for Generative AI
Watermark Design

Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings

Authors: Zonghao Ying, Guangyi Zheng, Yongxin Huang, Deyue Zhang, Wenxin Zhang, Quanchen Zou, Aishan Liu, Xianglong Liu, Dacheng Tao | Published: 2025-03-19
Prompt Injection
Large Language Model
Attack Method

Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models

Authors: Prashant Kulkarni, Assaf Namer | Published: 2025-03-18
Prompt Injection
Prompt leaking
Attack Method

MirrorShield: Towards Universal Defense Against Jailbreaks via Entropy-Guided Mirror Crafting

Authors: Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang | Published: 2025-03-17 | Updated: 2025-05-20
Prompt Injection
Large Language Model
Attack Method

Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14
Disabling Safety Mechanisms of LLM
Prompt Injection
Malicious Prompt

CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning

Authors: Adel ElZemity, Budi Arief, Shujun Li | Published: 2025-03-12 | Updated: 2025-09-17
Disabling Safety Mechanisms of LLM
Security Analysis
Prompt Injection

Probabilistic Modeling of Jailbreak on Multimodal LLMs: From Quantification to Application

Authors: Wenzhuo Xu, Zhipeng Wei, Xiongtao Sun, Zonghao Ying, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Quanchen Zou | Published: 2025-03-10 | Updated: 2025-07-31
Prompt Injection
Large Language Model
Robustness of Watermarking Techniques

Improving LLM Safety Alignment with Dual-Objective Optimization

Authors: Xuandong Zhao, Will Cai, Tianneng Shi, David Huang, Licong Lin, Song Mei, Dawn Song | Published: 2025-03-05 | Updated: 2025-06-12
Prompt Injection
Robustness Improvement Method
Trade-Off Between Safety And Usability

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

Authors: Hanjiang Hu, Alexander Robey, Changliu Liu | Published: 2025-02-28 | Updated: 2025-08-25
Backdoor Attack
Prompt Injection
Watermark

Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs

Authors: Shiyu Xiang, Ansen Zhang, Yanfei Cao, Yang Fan, Ronghao Chen | Published: 2025-02-26 | Updated: 2025-05-28
LLM Security
Prompt Injection
Attack Evaluation