In order to train networks for verified adversarial robustness, it is common
to over-approximate the worst-case loss over perturbation regions, resulting in
networks that attain verifiability at the expense of standard performance. As
shown in recent work, better trade-offs between accuracy and robustness can be
obtained by carefully coupling adversarial training with over-approximations.
We hypothesize that the expressivity of a loss function, which we formalize as
the ability to span a range of trade-offs between lower and upper bounds to the
worst-case loss through a single parameter (the over-approximation
coefficient), is key to attaining state-of-the-art performance. To support our
hypothesis, we show that trivial expressive losses, obtained via convex
combinations between adversarial attacks and IBP bounds, yield state-of-the-art
results across a variety of settings in spite of their conceptual simplicity.
We provide a detailed analysis of the relationship between the
over-approximation coefficient and performance profiles across different
expressive losses, showing that, while expressivity is essential, better
approximations of the worst-case loss are not necessarily linked to superior
robustness-accuracy trade-offs.