Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications

arxiv

Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2401.07612

PDF

https://arxiv.org/pdf/2401.07612

文献情報

作者: Xuchen Suo
公開日: 2024-1-15
所属機関: Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University
所属の国: Hong Kong, China
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

プロンプトインジェクション LLMセキュリティ

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

The critical challenge of prompt injection attacks in Large Language Models (LLMs) integrated applications, a growing concern in the Artificial Intelligence (AI) field. Such attacks, which manipulate LLMs through natural language inputs, pose a significant threat to the security of these applications. Traditional defense strategies, including output and input filtering, as well as delimiter use, have proven inadequate. This paper introduces the 'Signed-Prompt' method as a novel solution. The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources. The paper presents a comprehensive analysis of prompt injection attack patterns, followed by a detailed explanation of the Signed-Prompt concept, including its basic architecture and implementation through both prompt engineering and fine-tuning of LLMs. Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks, thus validating its potential as a robust defense strategy in AI security.

外部データセット

Delete Command Dataset

参考文献

Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples

H. J., Branch, J. R., Cefalu, J., McHugh, L., Hujer, A., Bahl, D. D. C., Iglesias, R., Darwishi

Published: 2022

Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

S., Abdelnabi, K., Greshake, S., Mishra, C., Endres, T., Holz, M., Fritz

Published: 2023

Prompt Injection Attacks and Defenses in LLM-Integrated Applications

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong

Published: 2023

The Dual LLM Pattern for Building AI Assistants that can resist prompt injection

Published: 2023

Assessing Prompt Injection Risks in 200+ Custom GPTs

Jiahao Yu, Yuhang Wu, Dong Shu, Mingyu Jin, Sabrina Yang, Xinyu Xing

Published: 2023.11.20

In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.

プロンプトインジェクションプロンプトリーキング対話システム

International Conference on Learning Representations (ICLR)

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

Published: 2023.11.2

While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by players of an online game called Tensor Trust. To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs. The attacks in our dataset have a lot of easily interpretable stucture, and shed light on the weaknesses of LLMs. We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking. Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset. Furthermore, we show that some attack strategies from the dataset generalize to deployed LLM-based applications, even though they have a very different set of constraints to the game. We release all data and source code at https://tensortrust.ai/paper

プロンプトインジェクションプロンプトエンジニアリングロバスト性評価

被引用数 18

Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu

Published: 2023.6.9

Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.

プロンプトインジェクション悪意のあるプロンプト

Meta-Radiology

Review of large vision models and visual prompt engineering

J., Wang, Z., Liu, L., Zhao, Z., Wu, C., Ma, S., Yu, S., Zhang

Published: 2023

Advances in Neural Information Processing Systems

Improved regularization and robustness for fine-tuning in neural networks

D., Li, H., Zhang

Published: 2021

Fine-tuning can distort pretrained features and underperform out-of-distribution

A., Kumar, A., Raghunathan, R., Jones, T., Ma, P., Liang

Published: 2022