Malicious Prompt

Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification

Authors: Yingjie Zhang, Tong Liu, Zhe Zhao, Guozhu Meng, Kai Chen | Published: 2025-03-14

Disabling Safety Mechanisms of LLM

Prompt Injection

Malicious Prompt

2025.03.14 2025.05.27

Literature Database

Can Indirect Prompt Injection Attacks Be Detected and Removed?

Authors: Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, Bryan Hooi | Published: 2025-02-23

Prompt validation

Malicious Prompt

Attack Method

2025.02.23 2025.05.27

Literature Database

Dagger Behind Smile: Fool LLMs with a Happy Ending Story

Authors: Xurui Song, Zhixin Xie, Shuo Huai, Jiayi Kong, Jun Luo | Published: 2025-01-19 | Updated: 2025-09-30

Disabling Safety Mechanisms of LLM

Malicious Prompt

攻撃手法の効果

2025.01.19 2025.10.02

Literature Database

Toxicity Detection for Free

Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner | Published: 2024-05-29 | Updated: 2024-11-08

Indirect Prompt Injection

Prompt validation

Malicious Prompt

2024.05.29 2025.05.27

Literature Database

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Authors: Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman | Published: 2024-03-20

Indirect Prompt Injection

Prompt Injection

Malicious Prompt

2024.03.20 2025.05.27

Literature Database

Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models

Authors: Junjie Chu, Zeyang Sha, Michael Backes, Yang Zhang | Published: 2024-02-05 | Updated: 2024-10-07

Privacy Protection

Prompt Injection

Malicious Prompt

2024.02.05 2025.05.27

Literature Database

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

Authors: Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu | Published: 2023-12-21 | Updated: 2025-01-27

Indirect Prompt Injection

Malicious Prompt

Vulnerability Analysis

2023.12.21 2025.05.27

Literature Database

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

Authors: Xilie Xu, Keyi Kong, Ning Liu, Lizhen Cui, Di Wang, Jingfeng Zhang, Mohan Kankanhalli | Published: 2023-10-20

Prompt Injection

Malicious Prompt

Adversarial attack

2023.10.20 2025.05.28

Literature Database

Detecting Language Model Attacks with Perplexity

Authors: Gabriel Alon, Michael Kamfonas | Published: 2023-08-27 | Updated: 2023-11-07

LLM Security

Prompt Injection

Malicious Prompt

2023.08.27 2025.05.28

Literature Database

Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

Authors: Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov | Published: 2023-07-19 | Updated: 2023-10-03

Indirect Prompt Injection

Malicious Prompt

Adversarial Example

2023.07.19 2025.05.28

Literature Database