Recently, a backdoor data poisoning attack was proposed, which adds
mislabeled examples to the training set, with an embedded backdoor pattern,
aiming to have the classifier learn to classify to a target class whenever the
backdoor pattern is present in a test sample. Here, we address post-training
detection of innocuous perceptible backdoors in DNN image classifiers, wherein
the defender does not have access to the poisoned training set, but only to the
trained classifier, as well as unpoisoned examples. This problem is challenging
because without the poisoned training set, we have no hint about the actual
backdoor pattern used during training. This post-training scenario is also of
great import because in many practical contexts the DNN user did not train the
DNN and does not have access to the training data. We identify two important
properties of perceptible backdoor patterns - spatial invariance and robustness
- based upon which we propose a novel detector using the maximum achievable
misclassification fraction (MAMF) statistic. We detect whether the trained DNN
has been backdoor-attacked and infer the source and target classes. Our
detector outperforms other existing detectors and, coupled with an
imperceptible backdoor detector, helps achieve post-training detection of all
evasive backdoors.