拒否メカニズム

Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction

Authors: Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu | Published: 2025-09-18
Prompt Injection
Safety Alignment
拒否メカニズム