Prompt Injection

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

Authors: Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin | Published: 2023-07-31 | Updated: 2024-04-03
LLM Security
System Prompt Generation
Prompt Injection

Universal and Transferable Adversarial Attacks on Aligned Language Models

Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson | Published: 2023-07-27 | Updated: 2023-12-20
LLM Security
Prompt Injection
Inappropriate Content Generation

Backdoor Attacks for In-Context Learning with Language Models

Authors: Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, Nicholas Carlini | Published: 2023-07-27
LLM Security
Backdoor Attack
Prompt Injection

Unveiling Security, Privacy, and Ethical Concerns of ChatGPT

Authors: Xiaodong Wu, Ran Duan, Jianbing Ni | Published: 2023-07-26
LLM Security
Prompt Injection
Inappropriate Content Generation

Getting pwn’d by AI: Penetration Testing with Large Language Models

Authors: Andreas Happe, Jürgen Cito | Published: 2023-07-24 | Updated: 2023-08-17
LLM Security
Prompt Injection
Penetration Testing Methods

The Looming Threat of Fake and LLM-generated LinkedIn Profiles: Challenges and Opportunities for Detection and Prevention

Authors: Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee | Published: 2023-07-21
Data Generation
Prompt Injection
Analysis of Detection Methods

A LLM Assisted Exploitation of AI-Guardian

Authors: Nicholas Carlini | Published: 2023-07-20
Prompt Injection
Membership Inference
Watermark Robustness

MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots

Authors: Gelei Deng, Yi Liu, Yuekang Li, Kailong Wang, Ying Zhang, Zefeng Li, Haoyu Wang, Tianwei Zhang, Yang Liu | Published: 2023-07-16 | Updated: 2023-10-25
Data Leakage
Prompt Injection
Watermark Robustness

Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild

Authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro Kakumaru, Roberto Bifulco | Published: 2023-07-14
Dataset Generation
Prompt Injection
Attack Pattern Extraction

Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

Authors: Bocheng Chen, Guangjing Wang, Hanqing Guo, Yuanda Wang, Qiben Yan | Published: 2023-07-14
Prompt Injection
Dialogue System
Attack Evaluation