Deep neural networks are vulnerable to adversarial examples, which becomes
one of the most important research problems in the development of deep
learning. While a lot of efforts have been made in recent years, it is of great
significance to perform correct and complete evaluations of the adversarial
attack and defense algorithms. In this paper, we establish a comprehensive,
rigorous, and coherent benchmark to evaluate adversarial robustness on image
classification tasks. After briefly reviewing plenty of representative attack
and defense methods, we perform large-scale experiments with two robustness
curves as the fair-minded evaluation criteria to fully understand the
performance of these methods. Based on the evaluation results, we draw several
important findings and provide insights for future research.