These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Recent studies have shown that deep neural networks (DNNs) are vulnerable to
adversarial attacks, including evasion and backdoor (poisoning) attacks. On the
defense side, there have been intensive efforts on improving both empirical and
provable robustness against evasion attacks; however, the provable robustness
against backdoor attacks still remains largely unexplored. In this paper, we
focus on certifying the machine learning model robustness against general
threat models, especially backdoor attacks. We first provide a unified
framework via randomized smoothing techniques and show how it can be
instantiated to certify the robustness against both evasion and backdoor
attacks. We then propose the first robust training process, RAB, to smooth the
trained model and certify its robustness against backdoor attacks. We prove the
robustness bound for machine learning models trained with RAB and prove that
our robustness bound is tight. In addition, we theoretically show that it is
possible to train the robust smoothed models efficiently for simple models such
as K-nearest neighbor classifiers, and we propose an exact smooth-training
algorithm that eliminates the need to sample from a noise distribution for such
models. Empirically, we conduct comprehensive experiments for different machine
learning (ML) models such as DNNs, support vector machines, and K-NN models on
MNIST, CIFAR-10, and ImageNette datasets and provide the first benchmark for
certified robustness against backdoor attacks. In addition, we evaluate K-NN
models on a spambase tabular dataset to demonstrate the advantages of the
proposed exact algorithm. Both the theoretic analysis and the comprehensive
evaluation on diverse ML models and datasets shed light on further robust
learning strategies against general training time attacks.