Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems

Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities, including data theft, worming, information ecosystem contamination, and other novel security risks. We demonstrate our attacks' practical viability against both real-world systems, such as Bing's GPT-4 powered Chat and code-completion engines, and synthetic applications built on GPT-4. We show how processing retrieved prompts can act as arbitrary code execution, manipulate the application's functionality, and control how and if other APIs are called. Despite the increasing integration and reliance on LLMs, effective mitigations of these emerging threats are currently lacking. By raising awareness of these vulnerabilities and providing key insights into their implications, we aim to promote the safe and responsible deployment of these powerful models and the development of robust defenses that protect users and systems from potential attacks.

インダイレクトプロンプトインジェクション悪意のあるプロンプトプロンプトインジェクション

33rd USENIX Security Symposium (USENIX Security 24)

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

Y. Liu, Y. Jia, R. Geng, J. Jia, N. Z. Gong

Published: 2024

arxiv

被引用数 2

Conference on Empirical Methods in Natural Language Processing (EMNLP)

Defending against Indirect Prompt Injection by Instruction Detection

Tongyu Wen, Chenglong Wang, Xiyuan Yang, Haoyu Tang, Yueqi Xie, Lingjuan Lyu, Zhicheng Dou, Fangzhao Wu

Published: 2025.5.8

The integration of Large Language Models (LLMs) with external sources is becoming increasingly common, with Retrieval-Augmented Generation (RAG) being a prominent example. However, this integration introduces vulnerabilities of Indirect Prompt Injection (IPI) attacks, where hidden instructions embedded in external data can manipulate LLMs into executing unintended or harmful actions. We recognize that IPI attacks fundamentally rely on the presence of instructions embedded within external content, which can alter the behavioral states of LLMs. Can the effective detection of such state changes help us defend against IPI attacks? In this paper, we propose InstructDetector, a novel detection-based approach that leverages the behavioral states of LLMs to identify potential IPI attacks. Specifically, we demonstrate the hidden states and gradients from intermediate layers provide highly discriminative features for instruction detection. By effectively combining these features, InstructDetector achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, and reduces the attack success rate to just 0.03% on the BIPIA benchmark. The code is publicly available at https://github.com/MYVAE/Instruction-detection.

プロンプトの検証評価手法透かし技術

2024 IEEE International Conference on Big Data (BigData)

Sok: Prompt hacking of large language models

B. Rababah, S. T. Wu, M. Kwiatkowski, C. K. Leung, C. G. Akcora

Published: 2024

CoRR

Rag and roll: An end-to-end evaluation of indirect prompt manipulations in llm-based application frameworks

G. De Stefano, L. Schonherr, G. Pellegrino

Published: 2024

CoRR

From prompts to templates: A systematic prompt template analysis for real-world llmapps

Y. Mao, J. He, C. Chen

Published: 2025

Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails

T. Rebedea, R. Dinu, M. Sreedhar, C. Parisien, J. Cohen

Published: 2023

Proceedings of the 12th ACM Conference on Computer and Communications Security

Control-flow integrity

M. Abadi, M. Budiu, U. Erlingsson, J. Ligatti

Published: 2005

Information

Prompt injection attacks in large language models

S. Gulyamov

Published: 2026

StruQ: Defending Against Prompt Injection with Structured Queries

Chen, S., Piet, J., Sitawarin, C., Wagner, D.

Published: 2024

arxiv

被引用数 18

Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu

Published: 2023.6.9

Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.

プロンプトインジェクション悪意のあるプロンプト

CoRR

After retrieval, before generation: Enhancing the trustworthiness of large language models in retrieval-augmented generation

X. Dai, H. Hu, Y. Hua, J. Li, Y. Chen, R. Jin, N. Hu, G. Qi

Published: 2025

arxiv

被引用数 2

Computing Research Repository (CoRR)

Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications

Xuchen Suo

Published: 2024.1.15

The critical challenge of prompt injection attacks in Large Language Models (LLMs) integrated applications, a growing concern in the Artificial Intelligence (AI) field. Such attacks, which manipulate LLMs through natural language inputs, pose a significant threat to the security of these applications. Traditional defense strategies, including output and input filtering, as well as delimiter use, have proven inadequate. This paper introduces the 'Signed-Prompt' method as a novel solution. The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources. The paper presents a comprehensive analysis of prompt injection attack patterns, followed by a detailed explanation of the Signed-Prompt concept, including its basic architecture and implementation through both prompt engineering and fine-tuning of LLMs. Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks, thus validating its potential as a robust defense strategy in AI security.

プロンプトインジェクション LLMセキュリティ

Algorithms

Embedding-based detection of indirect prompt injection attacks

M. Alamsabi

Published: 2026