These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Distribution shifts and adversarial examples are two major challenges for
deploying machine learning models. While these challenges have been studied
individually, their combination is an important topic that remains relatively
under-explored. In this work, we study the problem of adversarial robustness
under a common setting of distribution shift - unsupervised domain adaptation
(UDA). Specifically, given a labeled source domain $D_S$ and an unlabeled
target domain $D_T$ with related but different distributions, the goal is to
obtain an adversarially robust model for $D_T$. The absence of target domain
labels poses a unique challenge, as conventional adversarial robustness
defenses cannot be directly applied to $D_T$. To address this challenge, we
first establish a generalization bound for the adversarial target loss, which
consists of (i) terms related to the loss on the data, and (ii) a measure of
worst-case domain divergence. Motivated by this bound, we develop a novel
unified defense framework called Divergence Aware adveRsarial Training (DART),
which can be used in conjunction with a variety of standard UDA methods; e.g.,
DANN [Ganin and Lempitsky, 2015]. DART is applicable to general threat models,
including the popular $\ell_p$-norm model, and does not require heuristic
regularizers or architectural changes. We also release DomainRobust: a testbed
for evaluating robustness of UDA models to adversarial attacks. DomainRobust
consists of 4 multi-domain benchmark datasets (with 46 source-target pairs) and
7 meta-algorithms with a total of 11 variants. Our large-scale experiments
demonstrate that on average, DART significantly enhances model robustness on
all benchmarks compared to the state of the art, while maintaining competitive
standard accuracy. The relative improvement in robustness from DART reaches up
to 29.2% on the source-target domain pairs considered.