Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Authors: Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson | Published: 2023-10-05
Data Collection
Prompt Injection
Information Gathering Methods