Literature Database

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet

Authors: Nathaniel Li, Ziwen Han, Ian Steneker, Willow Primack, Riley Goodside, Hugh Zhang, Zifan Wang, Cristina Menghini, Summer Yue | Published: 2024-08-27 | Updated: 2024-09-04
Prompt Injection
User Education
Attack Method

Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection

Authors: Niklas Risse, Jing Liu, Marcel Böhme | Published: 2024-08-23 | Updated: 2025-04-23
Security Analysis
Vulnerability Management
Evaluation Method

Obfuscated Memory Malware Detection

Authors: Sharmila S P, Aruna Tiwari, Narendra S Chaudhari | Published: 2024-08-23
Cybersecurity
Malware Classification
Model Performance Evaluation

Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks

Authors: Yusuf Usman, Aadesh Upadhyay, Prashnna Gyawali, Robin Chataut | Published: 2024-08-23
Cybersecurity
Prompt Injection
Attack Method

LLM-PBE: Assessing Data Privacy in Large Language Models

Authors: Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng He, Dawn Song | Published: 2024-08-23 | Updated: 2024-09-06
LLM Security
Privacy Protection Method
Prompt Injection

Efficient Detection of Toxic Prompts in Large Language Models

Authors: Yi Liu, Junzhe Yu, Huijia Sun, Ling Shi, Gelei Deng, Yuqi Chen, Yang Liu | Published: 2024-08-21 | Updated: 2024-09-14
Content Moderation
Prompt Injection
Model Performance Evaluation

Private Counting of Distinct Elements in the Turnstile Model and Extensions

Authors: Monika Henzinger, A. R. Sricharan, Teresa Anna Steiner | Published: 2024-08-21
Algorithm
Privacy Protection Method

Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks

Authors: Ziqiang Li, Yueqi Zeng, Pengfei Xia, Lei Liu, Zhangjie Fu, Bin Li | Published: 2024-08-21
Backdoor Attack
Poisoning

EEG-Defender: Defending against Jailbreak through Early Exit Generation of Large Language Models

Authors: Chongwen Zhao, Zhihao Dou, Kaizhu Huang | Published: 2024-08-21
LLM Security
Prompt Injection
Defense Method

Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles

Authors: Zhilong Wang, Haizhou Wang, Nanqing Luo, Lan Zhang, Xiaoyan Sun, Yebo Cao, Peng Liu | Published: 2024-08-20 | Updated: 2025-02-07
Prompt Injection
Large Language Model
Attack Scenario Analysis