Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Authors: Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin | Published: 2024-05-31 | Updated: 2024-06-05

ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning

Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bo Li, Radha Poovendran | Published: 2024-05-31 | Updated: 2024-06-05

Phantom: General Backdoor Attacks on Retrieval Augmented Language Generation

Authors: Harsh Chaudhari, Giorgio Severi, John Abascal, Anshuman Suri, Matthew Jagielski, Christopher A. Choquette-Choo, Milad Nasr, Cristina Nita-Rotaru, Alina Oprea | Published: 2024-05-30 | Updated: 2025-10-01

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks

Authors: Chen Xiong, Xiangyu Qi, Pin-Yu Chen, Tsung-Yi Ho | Published: 2024-05-30 | Updated: 2025-06-04

Robust Kernel Hypothesis Testing under Data Corruption

Authors: Antonin Schrab, Ilmun Kim | Published: 2024-05-30

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Authors: Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu | Published: 2024-05-29

Toxicity Detection for Free

Authors: Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner | Published: 2024-05-29 | Updated: 2024-11-08

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Authors: Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie | Published: 2024-05-28 | Updated: 2024-06-02

Cross-Modal Safety Alignment: Is textual unlearning all you need?

Authors: Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song | Published: 2024-05-27 | Updated: 2025-10-14

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems

Authors: Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, Qi Zhu | Published: 2024-05-27 | Updated: 2024-10-05