PromptKeeper: Safeguarding System Prompts for LLMs

TOP 文献データベース PromptKeeper: Safeguarding System Prompts for LLMs

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2412.13426

PDF

https://arxiv.org/pdf/2412.13426

文献情報

作者: Zhifeng Jiang,Zhihua Jin,Guoliang He
公開日: 2024-12-18
更新日: 2025-8-27
所属機関: Independent Researcher
所属の国: Unknown
会議名

AIにより推定されたラベル

プロンプトインジェクション防御手法 LLM性能評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

System prompts are widely used to guide the outputs of large language models (LLMs). These prompts often contain business logic and sensitive information, making their protection essential. However, adversarial and even regular user queries can exploit LLM vulnerabilities to expose these hidden prompts. To address this issue, we propose PromptKeeper, a defense mechanism designed to safeguard system prompts by tackling two core challenges: reliably detecting leakage and mitigating side-channel vulnerabilities when leakage occurs. By framing detection as a hypothesis-testing problem, PromptKeeper effectively identifies both explicit and subtle leakage. Upon leakage detected, it regenerates responses using a dummy prompt, ensuring that outputs remain indistinguishable from typical interactions when no leakage is present. PromptKeeper ensures robust protection against prompt extraction attacks via either adversarial or regular queries, while preserving conversational capability and runtime efficiency during benign user interactions.