Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

TOP Literature Database Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2502.01027

PDF

https://arxiv.org/pdf/2502.01027

Paper Information

Author: Yannis Montreuil,Axel Carlier,Lai Xing Ng,Wei Tsang Ooi
Published: 2-3-2025
Updated: 8-25-2025
Affiliation: School of Computing, National University of Singapore
Country: Singapore
Conference: International Conference on Machine Learning (ICML)

Labels Estimated by AI

Adversarial Training Adversarial Example Learning-to-Defer

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts, supporting reliable decision-making in complex, multi-agent environments. However, existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation--causing costly misrouting or expert overload. We present the first comprehensive study of adversarial robustness in two-stage L2D systems. We introduce two novel attack strategie--untargeted and targeted--which respectively disrupt optimal allocations or force queries to specific agents. To defend against such threats, we propose SARD, a convex learning algorithm built on a family of surrogate losses that are provably Bayes-consistent and $(\mathcal{R}, \mathcal{G})$-consistent. These guarantees hold across classification, regression, and multi-task settings. Empirical results demonstrate that SARD significantly improves robustness under adversarial attacks while maintaining strong clean performance, marking a critical step toward secure and trustworthy L2D deployment.

External Datasets

CIFAR-100

California Housing

Pascal VOC