These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The critical challenge of prompt injection attacks in Large Language Models
(LLMs) integrated applications, a growing concern in the Artificial
Intelligence (AI) field. Such attacks, which manipulate LLMs through natural
language inputs, pose a significant threat to the security of these
applications. Traditional defense strategies, including output and input
filtering, as well as delimiter use, have proven inadequate. This paper
introduces the 'Signed-Prompt' method as a novel solution. The study involves
signing sensitive instructions within command segments by authorized users,
enabling the LLM to discern trusted instruction sources. The paper presents a
comprehensive analysis of prompt injection attack patterns, followed by a
detailed explanation of the Signed-Prompt concept, including its basic
architecture and implementation through both prompt engineering and fine-tuning
of LLMs. Experiments demonstrate the effectiveness of the Signed-Prompt method,
showing substantial resistance to various types of prompt injection attacks,
thus validating its potential as a robust defense strategy in AI security.