Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

TOP 文献データベース Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2105.14785

PDF

https://arxiv.org/pdf/2105.14785

文献情報

作者: Tianyu Pang;Huishuai Zhang;Di He;Yinpeng Dong;Hang Su;Wei Chen;Jun Zhu;Tie-Yan Liu
公開日: 2021-5-31
更新日: 2022-4-1
所属機関: Tsinghua University
所属の国: China
会議名: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

AIにより推定されたラベル

敵対的訓練ラベル不確実性分類パターン分析

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Correctly classifying adversarial examples is an essential but challenging requirement for safely deploying machine learning models. As reported in RobustBench, even the state-of-the-art adversarially trained models struggle to exceed 67% robust test accuracy on CIFAR-10, which is far from practical. A complementary way towards robustness is to introduce a rejection option, allowing the model to not return predictions on uncertain inputs, where confidence is a commonly used certainty proxy. Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones. This intriguing property sheds light on using coupling strategies to better detect and reject adversarial examples. We evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks including adaptive ones, and demonstrate that the RR module is compatible with different adversarial training frameworks on improving robustness, with little extra computation. The code is available at https://github.com/P2333/Rectified-Rejection.