NTD: Non-Transferability Enabled Backdoor Detection

TOP Literature Database NTD: Non-Transferability Enabled Backdoor Detection

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2111.11157

PDF

https://arxiv.org/pdf/2111.11157

Paper Information

Author: Yinshan Li;Hua Ma;Zhi Zhang;Yansong Gao;Alsharif Abuadbba;Anmin Fu;Yifeng Zheng;Said F. Al-Sarawi;Derek Abbott
Published: 11-22-2021
Affiliation: NanJing University of Science and Technology
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Face Recognition System Traffic Sign Classification Non-Transferable Detection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

A backdoor deep learning (DL) model behaves normally upon clean inputs but misbehaves upon trigger inputs as the backdoor attacker desires, posing severe consequences to DL model deployments. State-of-the-art defenses are either limited to specific backdoor attacks (source-agnostic attacks) or non-user-friendly in that machine learning (ML) expertise or expensive computing resources are required. This work observes that all existing backdoor attacks have an inevitable intrinsic weakness, non-transferability, that is, a trigger input hijacks a backdoored model but cannot be effective to another model that has not been implanted with the same backdoor. With this key observation, we propose non-transferability enabled backdoor detection (NTD) to identify trigger inputs for a model-under-test (MUT) during run-time.Specifically, NTD allows a potentially backdoored MUT to predict a class for an input. In the meantime, NTD leverages a feature extractor (FE) to extract feature vectors for the input and a group of samples randomly picked from its predicted class, and then compares similarity between the input and the samples in the FE's latent space. If the similarity is low, the input is an adversarial trigger input; otherwise, benign. The FE is a free pre-trained model privately reserved from open platforms. As the FE and MUT are from different sources, the attacker is very unlikely to insert the same backdoor into both of them. Because of non-transferability, a trigger effect that does work on the MUT cannot be transferred to the FE, making NTD effective against different types of backdoor attacks. We evaluate NTD on three popular customized tasks such as face recognition, traffic sign recognition and general animal classification, results of which affirm that NDT has high effectiveness (low false acceptance rate) and usability (low false rejection rate) with low detection latency.