From Defender to Devil? Unintended Risk Interactions Induced by LLM Defenses
Authors: Xiangtao Meng, Tianshuo Cong, Li Wang, Wenyu Chen, Zheng Li, Shanqing Guo, Xiaoyun Wang | Published: 2025-10-09
Alignment
Indirect Prompt Injection
Defense Effectiveness Analysis