Prompt Injection

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

Authors: Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin | Published: 2023-10-04
Prompt Injection
Safety Alignment
Malicious Content Generation

Low-Resource Languages Jailbreak GPT-4

Authors: Zheng-Xin Yong, Cristina Menghini, Stephen H. Bach | Published: 2023-10-03 | Updated: 2024-01-27
Prompt Injection
Safety Alignment
Vulnerability detection

Jailbreaker in Jail: Moving Target Defense for Large Language Models

Authors: Bocheng Chen, Advait Paliwal, Qiben Yan | Published: 2023-10-03
LLM Performance Evaluation
Prompt Injection
evaluation metrics

On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?

Authors: Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu | Published: 2023-10-02
LLM Performance Evaluation
Prompt Injection
Classification of Malicious Actors

Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives

Authors: Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, Ling Liu | Published: 2023-10-02 | Updated: 2023-10-16
Security Analysis
Prompt Injection
Vulnerability Prediction

Watch Your Language: Investigating Content Moderation with Large Language Models

Authors: Deepak Kumar, Yousef AbuHashem, Zakir Durumeric | Published: 2023-09-25 | Updated: 2024-01-17
LLM Performance Evaluation
Prompt Injection
Inappropriate Content Generation

Can LLM-Generated Misinformation Be Detected?

Authors: Canyu Chen, Kai Shu | Published: 2023-09-25 | Updated: 2024-04-23
LLM Performance Evaluation
Prompt Injection
Inappropriate Content Generation

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM

Authors: Bochuan Cao, Yuanpu Cao, Lu Lin, Jinghui Chen | Published: 2023-09-18 | Updated: 2024-06-12
Prompt Injection
Safety Alignment
Defense Method

FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

Authors: Dongyu Yao, Jianshu Zhang, Ian G. Harris, Marcel Carlsson | Published: 2023-09-11 | Updated: 2024-04-14
LLM Security
Watermarking
Prompt Injection

Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review

Authors: Zhenyong Zhang, Mengxiang Liu, Mingyang Sun, Ruilong Deng, Peng Cheng, Dusit Niyato, Mo-Yuen Chow, Jiming Chen | Published: 2023-08-30 | Updated: 2023-12-25
Energy Management
Prompt Injection
Adversarial Training