These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Recently, the field of adversarial machine learning has been garnering
attention by showing that state-of-the-art deep neural networks are vulnerable
to adversarial examples, stemming from small perturbations being added to the
input image. Adversarial examples are generated by a malicious adversary by
obtaining access to the model parameters, such as gradient information, to
alter the input or by attacking a substitute model and transferring those
malicious examples over to attack the victim model. Specifically, one of these
attack algorithms, Robust Physical Perturbations ($RP_2$), generates
adversarial images of stop signs with black and white stickers to achieve high
targeted misclassification rates against standard-architecture traffic sign
classifiers. In this paper, we propose BlurNet, a defense against the $RP_2$
attack. First, we motivate the defense with a frequency analysis of the first
layer feature maps of the network on the LISA dataset, which shows that high
frequency noise is introduced into the input image by the $RP_2$ algorithm. To
remove the high frequency noise, we introduce a depthwise convolution layer of
standard blur kernels after the first layer. We perform a blackbox transfer
attack to show that low-pass filtering the feature maps is more beneficial than
filtering the input. We then present various regularization schemes to
incorporate this low-pass filtering behavior into the training regime of the
network and perform white-box attacks. We conclude with an adaptive attack
evaluation to show that the success rate of the attack drops from 90\% to 20\%
with total variation regularization, one of the proposed defenses.