Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction Authors: Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu | Published: 2025-09-18 Prompt InjectionSafety Alignment拒否メカニズム 2025.09.18 2025.09.20 Literature Database