Convolutional Neural Networks (CNNs) are deployed in more and more
classification systems, but adversarial samples can be maliciously crafted to
trick them, and are becoming a real threat. There have been various proposals
to improve CNNs' adversarial robustness but these all suffer performance
penalties or other limitations. In this paper, we provide a new approach in the
form of a certifiable adversarial detection scheme, the Certifiable Taboo Trap
(CTT). The system can provide certifiable guarantees of detection of
adversarial inputs for certain $l_{\infty}$ sizes on a reasonable assumption,
namely that the training data have the same distribution as the test data. We
develop and evaluate several versions of CTT with a range of defense
capabilities, training overheads and certifiability on adversarial samples.
Against adversaries with various $l_p$ norms, CTT outperforms existing defense
methods that focus purely on improving network robustness. We show that CTT has
small false positive rates on clean test data, minimal compute overheads when
deployed, and can support complex security policies.