Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks Authors: Xin Yi, Yue Li, Linlin Wang, Xiaoling Wang, Liang He | Published: 2025-01-18 プロンプトインジェクション敵対的訓練過剰拒否緩和 2025.01.18 2025.04.03 文献データベース