Enhanced Attacks on Defensively Distilled Deep Neural Networks

Authors: Yujia Liu, Weiming Zhang, Shaohua Li, Nenghai Yu | Published: 2017-11-16

2017.11.162025.04.03

Authors: Yujia Liu, Weiming Zhang, Shaohua Li, Nenghai Yu
Published: 2017-11-16

Source: https://arxiv.org/abs/1711.05934

PDF: https://arxiv.org/pdf/1711.05934

AIにより推定されたラベル

敵対的サンプル敵対的攻撃分析ロバスト性向上

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Deep neural networks (DNNs) have achieved tremendous success in many tasks of machine learning, such as the image classification. Unfortunately, researchers have shown that DNNs are easily attacked by adversarial examples, slightly perturbed images which can mislead DNNs to give incorrect classification results. Such attack has seriously hampered the deployment of DNN systems in areas where security or safety requirements are strict, such as autonomous cars, face recognition, malware detection. Defensive distillation is a mechanism aimed at training a robust DNN which significantly reduces the effectiveness of adversarial examples generation. However, the state-of-the-art attack can be successful on distilled networks with 100 a white-box attack which needs to know the inner information of DNN. Whereas, the black-box scenario is more general. In this paper, we first propose the epsilon-neighborhood attack, which can fool the defensively distilled networks with 100 adversarial examples with good visual quality. On the basis of this attack, we further propose the region-based attack against defensively distilled DNNs in the black-box setting. And we also perform the bypass attack to indirectly break the distillation defense as a complementary method. The experimental results show that our black-box attacks have a considerable success rate on defensively distilled networks.