These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Making neural networks robust against adversarial inputs has resulted in an
arms race between new defenses and attacks. The most promising defenses,
adversarially robust training and verifiably robust training, have limitations
that restrict their practical applications. The adversarially robust training
only makes the networks robust against a subclass of attackers and we reveal
such weaknesses by developing a new attack based on interval gradients. By
contrast, verifiably robust training provides protection against any L-p
norm-bounded attacker but incurs orders of magnitude more computational and
memory overhead than adversarially robust training.
We propose two novel techniques, stochastic robust approximation and dynamic
mixed training, to drastically improve the efficiency of verifiably robust
training without sacrificing verified robustness. We leverage two critical
insights: (1) instead of over the entire training set, sound
over-approximations over randomly subsampled training data points are
sufficient for efficiently guiding the robust training process; and (2) We
observe that the test accuracy and verifiable robustness often conflict after
certain training epochs. Therefore, we use a dynamic loss function to
adaptively balance them for each epoch.
We designed and implemented our techniques as part of MixTrain and evaluated
it on six networks trained on three popular datasets including MNIST, CIFAR,
and ImageNet-200. Our evaluations show that MixTrain can achieve up to $95.2\%$
verified robust accuracy against $L_\infty$ norm-bounded attackers while taking
$15$ and $3$ times less training time than state-of-the-art verifiably robust
training and adversarially robust training schemes, respectively. Furthermore,
MixTrain easily scales to larger networks like the one trained on ImageNet-200,
significantly outperforming the existing verifiably robust training methods.