Prompt Injection

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Authors: Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson | Published: 2023-10-05

Data Collection

Prompt Injection

Information Gathering Methods

2023.10.05 2025.05.28

Literature Database

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Authors: Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas | Published: 2023-10-05 | Updated: 2024-06-11

LLM Performance Evaluation

Prompt Injection

Defense Method

2023.10.05 2025.05.28

Literature Database

Misusing Tools in Large Language Models With Visual Adversarial Examples

Authors: Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes | Published: 2023-10-04

LLM Performance Evaluation

Prompt Injection

Adversarial Example

2023.10.04 2025.05.28

Literature Database

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

Authors: Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin | Published: 2023-10-04

Prompt Injection

Safety Alignment

Malicious Content Generation

2023.10.04 2025.05.28

Literature Database

Low-Resource Languages Jailbreak GPT-4

Authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach | Published: 2023-10-03 | Updated: 2024-01-27

Prompt Injection

Safety Alignment

Vulnerability detection

2023.10.03 2025.05.28

Literature Database

Jailbreaker in Jail: Moving Target Defense for Large Language Models

Authors: Bocheng Chen, Advait Paliwal, Qiben Yan | Published: 2023-10-03

LLM Performance Evaluation

Prompt Injection

evaluation metrics

2023.10.03 2025.05.28

Literature Database

On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?

Authors: Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu | Published: 2023-10-02

LLM Performance Evaluation

Prompt Injection

Classification of Malicious Actors

2023.10.02 2025.05.28

Literature Database

Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives

Authors: Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, Ling Liu | Published: 2023-10-02 | Updated: 2023-10-16

Security Analysis

Prompt Injection

Vulnerability Prediction

2023.10.02 2025.05.28

Literature Database

Watch Your Language: Investigating Content Moderation with Large Language Models

Authors: Deepak Kumar, Yousef AbuHashem, Zakir Durumeric | Published: 2023-09-25 | Updated: 2024-01-17

LLM Performance Evaluation

Prompt Injection

Inappropriate Content Generation

2023.09.25 2025.05.28

Literature Database

Can LLM-Generated Misinformation Be Detected?

Authors: Canyu Chen, Kai Shu | Published: 2023-09-25 | Updated: 2024-04-23

LLM Performance Evaluation

Prompt Injection

Inappropriate Content Generation

2023.09.25 2025.05.28

Literature Database