Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi | Published: 2024-11-27 | Updated: 2025-03-20
プロンプトインジェクション
安全性アライメント
敵対的攻撃