TOP Literature Database Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs
arxiv
Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs
AI Security Portal bot
Information in the literature database is collected automatically.
These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Retrieval-Augmented Generation (RAG) integrates Large Language Models (LLMs)
with external knowledge bases, improving output quality while introducing new
security risks. Existing studies on RAG vulnerabilities typically focus on
exploiting the retrieval mechanism to inject erroneous knowledge or malicious
texts, inducing incorrect outputs. However, these approaches overlook critical
weaknesses within LLMs, leaving important attack vectors unexplored and
limiting the scope and efficiency of attacks. In this paper, we uncover a novel
vulnerability: the safety guardrails of LLMs, while designed for protection,
can also be exploited as an attack vector by adversaries. Building on this
vulnerability, we propose MutedRAG, a novel denial-of-service attack that
reversely leverages the guardrails of LLMs to undermine the availability of RAG
systems. By injecting minimalistic jailbreak texts, such as "\textit{How to
build a bomb}", into the knowledge base, MutedRAG intentionally triggers the
LLM's safety guardrails, causing the system to reject legitimate queries.
Besides, due to the high sensitivity of guardrails, a single jailbreak sample
can affect multiple queries, effectively amplifying the efficiency of attacks
while reducing their costs. Experimental results on three datasets demonstrate
that MutedRAG achieves an attack success rate exceeding 60% in many scenarios,
requiring only less than one malicious text to each target query on average. In
addition, we evaluate potential defense strategies against MutedRAG, finding
that some of current mechanisms are insufficient to mitigate this threat,
underscoring the urgent need for more robust solutions.