Robustness

DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection

Authors: Daniel Pulido-Cortázar, Daniel Gibert, Felip Manyà | Published: 2025-10-14
Backdoor Detection
Certified Robustness
Robustness

Large Language Models Are Effective Code Watermarkers

Authors: Rui Xu, Jiawei Chen, Zhaoxia Yin, Cong Kong, Xinpeng Zhang | Published: 2025-10-13
Prompt leaking
Robustness
Digital Watermarking for Generative AI

Adversarial Robustness in One-Stage Learning-to-Defer

Authors: Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi | Published: 2025-10-13
Robustness
Adversarial Learning
Defense Mechanism

MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation

Authors: Weisen Jiang, Sinno Jialin Pan | Published: 2025-10-09
Prompt Injection
Robustness
Defense Mechanism

Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs

Authors: Fatmazohra Rezkellah, Ramzi Dakhmouche | Published: 2025-10-03 | Updated: 2025-10-15
Identification of AI Output
Robustness
Large Language Model

Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment

Authors: Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son | Published: 2025-09-26 | Updated: 2025-10-09
Bias Detection in AI Output
Robustness
Defense Mechanism

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection

Authors: Yihao Guo, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu | Published: 2025-09-18
Poisoning attack on RAG
Online Learning
Robustness

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions

Authors: Wang YuHang, Junkang Guo, Aolei Liu, Kaihao Wang, Zaitong Wu, Zhenyu Liu, Wenfei Yin, Jian Liu | Published: 2025-03-02 | Updated: 2025-03-21
Robustness
Adversarial Learning
Adversarial Training

Reinforcement Unlearning

Authors: Dayong Ye, Tianqing Zhu, Congcong Zhu, Derui Wang, Kun Gao, Zewei Shi, Sheng Shen, Wanlei Zhou, Minhui Xue | Published: 2023-12-26 | Updated: 2024-09-09
Robustness
Reinforcement Learning
Complexity of the Environment

Understanding Overfitting in Adversarial Training via Kernel Regression

Authors: Teng Zhang, Kang Li | Published: 2023-04-13 | Updated: 2023-04-19
Watermarking
Robustness
Regularization