Architectural Backdoors in Neural Networks

TOP 文献データベース Architectural Backdoors in Neural Networks

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2206.07840

PDF

https://arxiv.org/pdf/2206.07840

文献情報

作者: Mikel Bober-Irizar;Ilia Shumailov;Yiren Zhao;Robert Mullins;Nicolas Papernot
公開日: 2022-6-16
所属機関: University of Cambridge
所属の国: United Kingdom
会議名: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

AIにより推定されたラベル

敵対的攻撃敵対的学習脅威モデル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Machine learning is vulnerable to adversarial manipulation. Previous literature has demonstrated that at the training stage attackers can manipulate data and data sampling procedures to control model behaviour. A common attack goal is to plant backdoors i.e. force the victim model to learn to recognise a trigger known only by the adversary. In this paper, we introduce a new class of backdoor attacks that hide inside model architectures i.e. in the inductive bias of the functions used to train. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture that others will reuse unknowingly. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch. We formalise the main construction principles behind architectural backdoors, such as a link between the input and the output, and describe some possible protections against them. We evaluate our attacks on computer vision benchmarks of different scales and demonstrate the underlying vulnerability is pervasive in a variety of training settings.