These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
In order to train networks for verified adversarial robustness, it is common
to over-approximate the worst-case loss over perturbation regions, resulting in
networks that attain verifiability at the expense of standard performance. As
shown in recent work, better trade-offs between accuracy and robustness can be
obtained by carefully coupling adversarial training with over-approximations.
We hypothesize that the expressivity of a loss function, which we formalize as
the ability to span a range of trade-offs between lower and upper bounds to the
worst-case loss through a single parameter (the over-approximation
coefficient), is key to attaining state-of-the-art performance. To support our
hypothesis, we show that trivial expressive losses, obtained via convex
combinations between adversarial attacks and IBP bounds, yield state-of-the-art
results across a variety of settings in spite of their conceptual simplicity.
We provide a detailed analysis of the relationship between the
over-approximation coefficient and performance profiles across different
expressive losses, showing that, while expressivity is essential, better
approximations of the worst-case loss are not necessarily linked to superior
robustness-accuracy trade-offs.