AIセキュリティポータル K Program
Prompt Control-Flow Integrity: A Priority-Aware Runtime Defense Against Prompt Injection in LLM Systems
Share
Abstract
Large language models (LLMs) deployed behind APIs and retrieval-augmented generation (RAG) stacks are vulnerable to prompt injection attacks that may override system policies, subvert intended behavior, and induce unsafe outputs. Existing defenses often treat prompts as flat strings and rely on ad hoc filtering or static jailbreak detection. This paper proposes Prompt Control-Flow Integrity (PCFI), a priority-aware runtime defense that models each request as a structured composition of system, developer, user, and retrieved-document segments. PCFI applies a three-stage middleware pipeline, lexical heuristics, role-switch detection, and hierarchical policy enforcement, before forwarding requests to the backend LLM. We implement PCFI as a FastAPI-based gateway for deployed LLM APIs and evaluate it on a custom benchmark of synthetic and semi-realistic prompt-injection workloads. On the evaluated benchmark suite, PCFI intercepts all attack-labeled requests, maintains a 0% False Positive Rate, and introduces a median processing overhead of only 0.04 ms. These results suggest that provenance- and priority-aware prompt enforcement is a practical and lightweight defense for deployed LLM systems.
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
Published: 2023.2.24
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Y. Liu, Y. Jia, R. Geng, J. Jia, N. Z. Gong
Published: 2024
Defending against Indirect Prompt Injection by Instruction Detection
Tongyu Wen, Chenglong Wang, Xiyuan Yang, Haoyu Tang, Yueqi Xie, Lingjuan Lyu, Zhicheng Dou, Fangzhao Wu
Published: 2025.5.8
Sok: Prompt hacking of large language models
B. Rababah, S. T. Wu, M. Kwiatkowski, C. K. Leung, C. G. Akcora
Published: 2024
Nemo guardrails: A toolkit for controllable and safe llm applications with programmable rails
T. Rebedea, R. Dinu, M. Sreedhar, C. Parisien, J. Cohen
Published: 2023
Control-flow integrity
M. Abadi, M. Budiu, U. Erlingsson, J. Ligatti
Published: 2005
Prompt injection attacks in large language models
S. Gulyamov
Published: 2026
Prompt Injection attack against LLM-integrated Applications
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu
Published: 2023.6.9
Embedding-based detection of indirect prompt injection attacks
M. Alamsabi
Published: 2026
Share