SQL Injection Jailbreak: A Structural Disaster of Large Language Models

TOP Literature Database SQL Injection Jailbreak: A Structural Disaster of Large Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2411.01565

PDF

https://arxiv.org/pdf/2411.01565

Paper Information

Author: Jiawei Zhao,Kejiang Chen,Weiming Zhang,Nenghai Yu
Published: 11-3-2024
Updated: 5-22-2025
Affiliation: University of Science and Technology of China
Country: China
Conference: Annual Meeting of the Association for Computational Linguistics (ACL)

Labels Estimated by AI

Attack Type Prompt Injection Prompt leaking

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) are susceptible to jailbreak attacks that can induce them to generate harmful content. Previous jailbreak methods primarily exploited the internal properties or capabilities of LLMs, such as optimization-based jailbreak methods and methods that leveraged the model's context-learning abilities. In this paper, we introduce a novel jailbreak method, SQL Injection Jailbreak (SIJ), which targets the external properties of LLMs, specifically, the way LLMs construct input prompts. By injecting jailbreak information into user prompts, SIJ successfully induces the model to output harmful content. For open-source models, SIJ achieves near 100% attack success rates on five well-known LLMs on the AdvBench and HEx-PHI, while incurring lower time costs compared to previous methods. For closed-source models, SIJ achieves an average attack success rate over 85% across five models in the GPT and Doubao series. Additionally, SIJ exposes a new vulnerability in LLMs that urgently requires mitigation. To address this, we propose a simple adaptive defense method called Self-Reminder-Key to counter SIJ and demonstrate its effectiveness through experimental results. Our code is available at https://github.com/weiyezhimeng/SQL-Injection-Jailbreak.

External Datasets

AdvBench

HEx-PHI