LLM Security

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Authors: Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, Joshua Saxe | Published: 2023-12-07
LLM Security
Cybersecurity
Prompt Injection

FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

Authors: Dongyu Yao, Jianshu Zhang, Ian G. Harris, Marcel Carlsson | Published: 2023-09-11 | Updated: 2024-04-14
LLM Security
Watermarking
Prompt Injection

Detecting Language Model Attacks with Perplexity

Authors: Gabriel Alon, Michael Kamfonas | Published: 2023-08-27 | Updated: 2023-11-07
LLM Security
Prompt Injection
Malicious Prompt

ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching

Authors: M. Caner Tol, Berk Sunar | Published: 2023-08-24
LLM Security
Vulnerability Mitigation Technique
Watermark Robustness

Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments

Authors: Maria Rigaki, Ondřej Lukáš, Carlos A. Catania, Sebastian Garcia | Published: 2023-08-23 | Updated: 2023-08-28
LLM Security
Experimental Validation
Reinforcement Learning Environment

DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection

Authors: Sudipta Paria, Aritra Dasgupta, Swarup Bhunia | Published: 2023-08-14
LLM Security
Security Assurance
Vulnerability Mitigation Technique

“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models

Authors: Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, Yang Zhang | Published: 2023-08-07 | Updated: 2024-05-15
LLM Security
Character Role Acting
Prompt Injection

Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection

Authors: Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin | Published: 2023-07-31 | Updated: 2024-04-03
LLM Security
System Prompt Generation
Prompt Injection

Universal and Transferable Adversarial Attacks on Aligned Language Models

Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson | Published: 2023-07-27 | Updated: 2023-12-20
LLM Security
Prompt Injection
Inappropriate Content Generation

Backdoor Attacks for In-Context Learning with Language Models

Authors: Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, Nicholas Carlini | Published: 2023-07-27
LLM Security
Backdoor Attack
Prompt Injection