Adversarial Training

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions

Authors: Wang YuHang, Junkang Guo, Aolei Liu, Kaihao Wang, Zaitong Wu, Zhenyu Liu, Wenfei Yin, Jian Liu | Published: 2025-03-02 | Updated: 2025-03-21
Robustness
Adversarial Learning
Adversarial Training

“Short-length” Adversarial Training Helps LLMs Defend “Long-length” Jailbreak Attacks: Theoretical and Empirical Evidence

Authors: Shaopeng Fu, Liang Ding, Di Wang | Published: 2025-02-06
Prompt Injection
Large Language Model
Adversarial Training

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi | Published: 2025-02-03
Learning-to-Defer
Adversarial Example
Adversarial Training

Smoothed Embeddings for Robust Language Models

Authors: Ryo Hase, Md Rafi Ur Rashid, Ashley Lewis, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang | Published: 2025-01-27
Prompt Injection
Membership Inference
Adversarial Training

Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks

Authors: Xin Yi, Yue Li, Linlin Wang, Xiaoling Wang, Liang He | Published: 2025-01-18
Prompt Injection
Adversarial Training
Excessive Denial Mitigation

Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

Authors: Olukorede Fakorede, Modeste Atsague, Jin Tian | Published: 2024-12-27
Adversarial Example
Adversarial Training

GLL: A Differentiable Graph Learning Layer for Neural Networks

Authors: Jason Brown, Bohan Chen, Harris Hardiman-Mostow, Jeff Calder, Andrea L. Bertozzi | Published: 2024-12-11
Poisoning
Adversarial Training

On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

Authors: Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, Julia Kempe | Published: 2024-10-21
Convergence Analysis
Adversarial Training

Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings

Authors: Hossein Mirzaei, Mackenzie W. Mathis | Published: 2024-10-14 | Updated: 2025-01-26
Membership Inference
Adversarial Training

Towards Calibrated Losses for Adversarial Robust Reject Option Classification

Authors: Vrund Shah, Tejas Chaudhari, Naresh Manwani | Published: 2024-10-14
Adversarial Training