These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs
either to a predictor or to external experts. While promising, L2D is highly
vulnerable to adversarial perturbations, which can not only flip predictions
but also manipulate deferral decisions. Prior robustness analyses focus solely
on two-stage settings, leaving open the end-to-end (one-stage) case where
predictor and allocation are trained jointly. We introduce the first framework
for adversarial robustness in one-stage L2D, covering both classification and
regression. Our approach formalizes attacks, proposes cost-sensitive
adversarial surrogate losses, and establishes theoretical guarantees including
$\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency.
Experiments on benchmark datasets confirm that our methods improve robustness
against untargeted and targeted attacks while preserving clean performance.