Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework
Authors: Matthew Pisano, Peter Ly, Abraham Sanders, Bingsheng Yao, Dakuo Wang, Tomek Strzalkowski, Mei Si | Published: 2023-11-16 | Updated: 2024-08-18
Prompt Injection
Multilingual LLM Jailbreak
Adversarial attack