BlurNet: Defense by Filtering the Feature Maps

TOP 文献データベース BlurNet: Defense by Filtering the Feature Maps

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1908.02256

PDF

https://arxiv.org/pdf/1908.02256

文献情報

作者: Ravi Raju,Mikko Lipasti
公開日: 2019-8-7
更新日: 2020-5-17
所属機関: Department of Electrical Engineering, University of Wisconsin-Madison
所属の国: United States of America
会議名: DSN Workshops

AIにより推定されたラベル

敵対的攻撃手法堅牢性向上手法攻撃の評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Recently, the field of adversarial machine learning has been garnering attention by showing that state-of-the-art deep neural networks are vulnerable to adversarial examples, stemming from small perturbations being added to the input image. Adversarial examples are generated by a malicious adversary by obtaining access to the model parameters, such as gradient information, to alter the input or by attacking a substitute model and transferring those malicious examples over to attack the victim model. Specifically, one of these attack algorithms, Robust Physical Perturbations ($RP_2$), generates adversarial images of stop signs with black and white stickers to achieve high targeted misclassification rates against standard-architecture traffic sign classifiers. In this paper, we propose BlurNet, a defense against the $RP_2$ attack. First, we motivate the defense with a frequency analysis of the first layer feature maps of the network on the LISA dataset, which shows that high frequency noise is introduced into the input image by the $RP_2$ algorithm. To remove the high frequency noise, we introduce a depthwise convolution layer of standard blur kernels after the first layer. We perform a blackbox transfer attack to show that low-pass filtering the feature maps is more beneficial than filtering the input. We then present various regularization schemes to incorporate this low-pass filtering behavior into the training regime of the network and perform white-box attacks. We conclude with an adaptive attack evaluation to show that the success rate of the attack drops from 90\% to 20\% with total variation regularization, one of the proposed defenses.