
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Authors: Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun | Published: 2023-10-23 | Updated: 2023-12-14