Adversarial examples are considered a serious issue for safety critical
applications of AI, such as finance, autonomous vehicle control and medicinal
applications. Though significant work has resulted in increased robustness of
systems to these attacks, systems are still vulnerable to well-crafted attacks.
To address this problem, several adversarial attack detection methods have been
proposed. However, a system can still be vulnerable to adversarial samples that
are designed to specifically evade these detection methods. One recent
detection scheme that has shown good performance is based on uncertainty
estimates derived from Monte-Carlo dropout ensembles. Prior Networks, a new
method of estimating predictive uncertainty, has been shown to outperform
Monte-Carlo dropout on a range of tasks. One of the advantages of this approach
is that the behaviour of a Prior Network can be explicitly tuned to, for
example, predict high uncertainty in regions where there are no training data
samples. In this work, Prior Networks are applied to adversarial attack
detection using measures of uncertainty in a similar fashion to Monte-Carlo
Dropout. Detection based on measures of uncertainty derived from DNNs and
Monte-Carlo dropout ensembles are used as a baseline. Prior Networks are shown
to significantly out-perform these baseline approaches over a range of
adversarial attacks in both detection of whitebox and blackbox configurations.
Even when the adversarial attacks are constructed with full knowledge of the
detection mechanism, it is shown to be highly challenging to successfully
generate an adversarial sample.