Prompt injection attack, where an attacker injects a prompt into the original
one, aiming to make an Large Language Model (LLM) follow the injected prompt to
perform an attacker-chosen task, represent a critical security threat. Existing
attacks primarily focus on crafting these injections at inference time,
treating the LLM itself as a static target. Our experiments show that these
attacks achieve some success, but there is still significant room for
improvement. In this work, we introduces a more foundational attack vector:
poisoning the LLM's alignment process to amplify the success of future prompt
injection attacks. Specifically, we propose PoisonedAlign, a method that
strategically creates poisoned alignment samples to poison an LLM's alignment
dataset. Our experiments across five LLMs and two alignment datasets show that
when even a small fraction of the alignment data is poisoned, the resulting
model becomes substantially more vulnerable to a wide range of prompt injection
attacks. Crucially, this vulnerability is instilled while the LLM's performance
on standard capability benchmarks remains largely unchanged, making the
manipulation difficult to detect through automated, general-purpose performance
evaluations. The code for implementing the attack is available at
https://github.com/Sadcardation/PoisonedAlign.