敵対的訓練

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions

Authors: Wang YuHang, Junkang Guo, Aolei Liu, Kaihao Wang, Zaitong Wu, Zhenyu Liu, Wenfei Yin, Jian Liu | Published: 2025-03-02 | Updated: 2025-03-21

ロバスト性

敵対的学習

敵対的訓練

2025.03.02 2025.04.03

文献データベース

“Short-length” Adversarial Training Helps LLMs Defend “Long-length” Jailbreak Attacks: Theoretical and Empirical Evidence

Authors: Shaopeng Fu, Liang Ding, Di Wang | Published: 2025-02-06

プロンプトインジェクション

大規模言語モデル

敵対的訓練

2025.02.06 2025.04.03

文献データベース

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Authors: Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi | Published: 2025-02-03

Learning-to-Defer

敵対的サンプル

敵対的訓練

2025.02.03 2025.04.03

文献データベース

Smoothed Embeddings for Robust Language Models

Authors: Ryo Hase, Md Rafi Ur Rashid, Ashley Lewis, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang | Published: 2025-01-27

プロンプトインジェクション

メンバーシップ推論

敵対的訓練

2025.01.27 2025.04.03

文献データベース

Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks

Authors: Xin Yi, Yue Li, Linlin Wang, Xiaoling Wang, Liang He | Published: 2025-01-18

プロンプトインジェクション

敵対的訓練

過剰拒否緩和

2025.01.18 2025.04.03

文献データベース

Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

Authors: Olukorede Fakorede, Modeste Atsague, Jin Tian | Published: 2024-12-27

敵対的サンプル

敵対的訓練

2024.12.27 2025.04.03

文献データベース

GLL: A Differentiable Graph Learning Layer for Neural Networks

Authors: Jason Brown, Bohan Chen, Harris Hardiman-Mostow, Jeff Calder, Andrea L. Bertozzi | Published: 2024-12-11

ポイズニング

敵対的訓練

2024.12.11 2025.04.03

文献データベース

On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

Authors: Matteo Vilucchio, Nikolaos Tsilivis, Bruno Loureiro, Julia Kempe | Published: 2024-10-21

収束分析

敵対的訓練

2024.10.21 2025.04.03

文献データベース

Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings

Authors: Hossein Mirzaei, Mackenzie W. Mathis | Published: 2024-10-14 | Updated: 2025-01-26

メンバーシップ推論

敵対的訓練

2024.10.14 2025.04.03

文献データベース

Towards Calibrated Losses for Adversarial Robust Reject Option Classification

Authors: Vrund Shah, Tejas Chaudhari, Naresh Manwani | Published: 2024-10-14

敵対的訓練

2024.10.14 2025.04.03

文献データベース