Recently, the field of adversarial machine learning has been garnering
attention by showing that state-of-the-art deep neural networks are vulnerable
to adversarial examples, stemming from small perturbations being added to the
input image. Adversarial examples are generated by a malicious adversary by
obtaining access to the model parameters, such as gradient information, to
alter the input or by attacking a substitute model and transferring those
malicious examples over to attack the victim model. Specifically, one of these
attack algorithms, Robust Physical Perturbations ($RP_2$), generates
adversarial images of stop signs with black and white stickers to achieve high
targeted misclassification rates against standard-architecture traffic sign
classifiers. In this paper, we propose BlurNet, a defense against the $RP_2$
attack. First, we motivate the defense with a frequency analysis of the first
layer feature maps of the network on the LISA dataset, which shows that high
frequency noise is introduced into the input image by the $RP_2$ algorithm. To
remove the high frequency noise, we introduce a depthwise convolution layer of
standard blur kernels after the first layer. We perform a blackbox transfer
attack to show that low-pass filtering the feature maps is more beneficial than
filtering the input. We then present various regularization schemes to
incorporate this low-pass filtering behavior into the training regime of the
network and perform white-box attacks. We conclude with an adaptive attack
evaluation to show that the success rate of the attack drops from 90\% to 20\%
with total variation regularization, one of the proposed defenses.