Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness

TOP 文献データベース Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness

Conference on Neural Information Processing Systems (NeurIPS)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1905.13472

PDF

https://arxiv.org/pdf/1905.13472

文献情報

作者: Andrey Malinin,Mark Gales
公開日: 2019-5-31
更新日: 2019-12-2
所属機関: Yandex
所属の国: Russia
会議名: Conference on Neural Information Processing Systems (NeurIPS)

AIにより推定されたラベル

ポイズニング不確実性推定生成モデル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Ensemble approaches for uncertainty estimation have recently been applied to the tasks of misclassification detection, out-of-distribution input detection and adversarial attack detection. Prior Networks have been proposed as an approach to efficiently \emph{emulate} an ensemble of models for classification by parameterising a Dirichlet prior distribution over output distributions. These models have been shown to outperform alternative ensemble approaches, such as Monte-Carlo Dropout, on the task of out-of-distribution input detection. However, scaling Prior Networks to complex datasets with many classes is difficult using the training criteria originally proposed. This paper makes two contributions. First, we show that the appropriate training criterion for Prior Networks is the \emph{reverse} KL-divergence between Dirichlet distributions. This addresses issues in the nature of the training data target distributions, enabling prior networks to be successfully trained on classification tasks with arbitrarily many classes, as well as improving out-of-distribution detection performance. Second, taking advantage of this new training criterion, this paper investigates using Prior Networks to detect adversarial attacks and proposes a generalized form of adversarial training. It is shown that the construction of successful \emph{adaptive} whitebox attacks, which affect the prediction and evade detection, against Prior Networks trained on CIFAR-10 and CIFAR-100 using the proposed approach requires a greater amount of computational effort than against networks defended using standard adversarial training or MC-dropout.