AIにより推定されたラベル
※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。
Abstract
We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We are able to achieve this improvement even when the adversary has access to the entire original training data set and unlimited query access to the defense. We verify our claim through experimentation using Fashion-MNIST and CIFAR-10: We demonstrate < 11% attack success rate – significantly lower than what other well-known state-of-the-art defenses offer – at only a price of a 11 − 18% drop in clean accuracy. By using a new intuitive metric, we explain why this trade-off offers a significant improvement over prior work.