The field of defense strategies against adversarial attacks has significantly
grown over the last years, but progress is hampered as the evaluation of
adversarial defenses is often insufficient and thus gives a wrong impression of
robustness. Many promising defenses could be broken later on, making it
difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation
are improper tuning of hyperparameters of the attacks, gradient obfuscation or
masking. In this paper we first propose two extensions of the PGD-attack
overcoming failures due to suboptimal step size and problems of the objective
function. We then combine our novel attacks with two complementary existing
ones to form a parameter-free, computationally affordable and user-independent
ensemble of attacks to test adversarial robustness. We apply our ensemble to
over 50 models from papers published at recent top machine learning and
computer vision venues. In all except one of the cases we achieve lower robust
test accuracy than reported in these papers, often by more than $10\%$,
identifying several broken defenses.