These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Two-stage Learning-to-Defer (L2D) enables optimal task delegation by
assigning each input to either a fixed main model or one of several offline
experts, supporting reliable decision-making in complex, multi-agent
environments. However, existing L2D frameworks assume clean inputs and are
vulnerable to adversarial perturbations that can manipulate query
allocation--causing costly misrouting or expert overload. We present the first
comprehensive study of adversarial robustness in two-stage L2D systems. We
introduce two novel attack strategie--untargeted and targeted--which
respectively disrupt optimal allocations or force queries to specific agents.
To defend against such threats, we propose SARD, a convex learning algorithm
built on a family of surrogate losses that are provably Bayes-consistent and
$(\mathcal{R}, \mathcal{G})$-consistent. These guarantees hold across
classification, regression, and multi-task settings. Empirical results
demonstrate that SARD significantly improves robustness under adversarial
attacks while maintaining strong clean performance, marking a critical step
toward secure and trustworthy L2D deployment.