Correctly classifying adversarial examples is an essential but challenging
requirement for safely deploying machine learning models. As reported in
RobustBench, even the state-of-the-art adversarially trained models struggle to
exceed 67% robust test accuracy on CIFAR-10, which is far from practical. A
complementary way towards robustness is to introduce a rejection option,
allowing the model to not return predictions on uncertain inputs, where
confidence is a commonly used certainty proxy. Along with this routine, we find
that confidence and a rectified confidence (R-Con) can form two coupled
rejection metrics, which could provably distinguish wrongly classified inputs
from correctly classified ones. This intriguing property sheds light on using
coupling strategies to better detect and reject adversarial examples. We
evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and
CIFAR-100 under several attacks including adaptive ones, and demonstrate that
the RR module is compatible with different adversarial training frameworks on
improving robustness, with little extra computation. The code is available at
https://github.com/P2333/Rectified-Rejection.