Inappropriate Content Generation

JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering

Authors: Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, Minlie Huang | Published: 2025-08-07

Prompt Injection

Inappropriate Content Generation

攻撃戦略分析

2025.08.07 2025.08.09

Literature Database

Using Hallucinations to Bypass GPT4’s Filter

Authors: Benjamin Lemkin | Published: 2024-02-16 | Updated: 2024-03-11

LLM Security

Prompt Injection

Inappropriate Content Generation

2024.02.16 2025.05.27

Literature Database

Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

Authors: Zhuo Zhang, Guangyu Shen, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang | Published: 2023-12-08

LLM Security

Prompt Injection

Inappropriate Content Generation

2023.12.08 2025.05.28

Literature Database

Comprehensive Assessment of Toxicity in ChatGPT

Authors: Boyang Zhang, Xinyue Shen, Wai Man Si, Zeyang Sha, Zeyuan Chen, Ahmed Salem, Yun Shen, Michael Backes, Yang Zhang | Published: 2023-11-03

Abuse of AI Chatbots

Prompt Injection

Inappropriate Content Generation

2023.11.03 2025.05.28

Literature Database

Watch Your Language: Investigating Content Moderation with Large Language Models

Authors: Deepak Kumar, Yousef AbuHashem, Zakir Durumeric | Published: 2023-09-25 | Updated: 2024-01-17

LLM Performance Evaluation

Prompt Injection

Inappropriate Content Generation

2023.09.25 2025.05.28

Literature Database

Can LLM-Generated Misinformation Be Detected?

Authors: Canyu Chen, Kai Shu | Published: 2023-09-25 | Updated: 2024-04-23

LLM Performance Evaluation

Prompt Injection

Inappropriate Content Generation

2023.09.25 2025.05.28

Literature Database

Universal and Transferable Adversarial Attacks on Aligned Language Models

Authors: Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson | Published: 2023-07-27 | Updated: 2023-12-20

LLM Security

Prompt Injection

Inappropriate Content Generation

2023.07.27 2025.05.28

Literature Database

Unveiling Security, Privacy, and Ethical Concerns of ChatGPT

Authors: Xiaodong Wu, Ran Duan, Jianbing Ni | Published: 2023-07-26

LLM Security

Prompt Injection

Inappropriate Content Generation

2023.07.26 2025.05.28

Literature Database

Visual Adversarial Examples Jailbreak Aligned Large Language Models

Authors: Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal | Published: 2023-06-22 | Updated: 2023-08-16

Prompt Injection

Inappropriate Content Generation

Adversarial attack

2023.06.22 2025.05.28

Literature Database