Large Language Models (LLMs) have demonstrated remarkable performance across
a wide range of applications, e.g., medical question-answering, mathematical
sciences, and code generation. However, they also exhibit inherent limitations,
such as outdated knowledge and susceptibility to hallucinations.
Retrieval-Augmented Generation (RAG) has emerged as a promising paradigm to
address these issues, but it also introduces new vulnerabilities. Recent
efforts have focused on the security of RAG-based LLMs, yet existing attack
methods face three critical challenges: (1) their effectiveness declines
sharply when only a limited number of poisoned texts can be injected into the
knowledge database, (2) they lack sufficient stealth, as the attacks are often
detectable by anomaly detection systems, which compromises their effectiveness,
and (3) they rely on heuristic approaches to generate poisoned texts, lacking
formal optimization frameworks and theoretic guarantees, which limits their
effectiveness and applicability. To address these issues, we propose
coordinated Prompt-RAG attack (PR-attack), a novel optimization-driven attack
that introduces a small number of poisoned texts into the knowledge database
while embedding a backdoor trigger within the prompt. When activated, the
trigger causes the LLM to generate pre-designed responses to targeted queries,
while maintaining normal behavior in other contexts. This ensures both high
effectiveness and stealth. We formulate the attack generation process as a
bilevel optimization problem leveraging a principled optimization framework to
develop optimal poisoned texts and triggers. Extensive experiments across
diverse LLMs and datasets demonstrate the effectiveness of PR-Attack, achieving
a high attack success rate even with a limited number of poisoned texts and
significantly improved stealth compared to existing methods.