Are aligned neural networks adversarially aligned?
Authors: Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt | Published: 2023-06-26 | Updated: 2024-05-06
プロンプトインジェクション
敵対的サンプル
敵対的攻撃手法