We propose a simple change to existing neural network structures for better
defending against gradient-based adversarial attacks. Instead of using popular
activation functions (such as ReLU), we advocate the use of k-Winners-Take-All
(k-WTA) activation, a C0 discontinuous function that purposely invalidates the
neural network model's gradient at densely distributed input data points. The
proposed k-WTA activation can be readily used in nearly all existing networks
and training methods with no significant overhead. Our proposal is
theoretically rationalized. We analyze why the discontinuities in k-WTA
networks can largely prevent gradient-based search of adversarial examples and
why they at the same time remain innocuous to the network training. This
understanding is also empirically backed. We test k-WTA activation on various
network structures optimized by a training method, be it adversarial training
or not. In all cases, the robustness of k-WTA networks outperforms that of
traditional networks under white-box attacks.