Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels

Teacher Model Fingerprinting Attacks Against Transfer Learning

Yufei Chen, Chao Shen, Cong Wang, Yang Zhang

Published: 2021.6.24

Transfer learning has become a common solution to address training data scarcity in practice. It trains a specified student model by reusing or fine-tuning early layers of a well-trained teacher model that is usually publicly available. However, besides utility improvement, the transferred public knowledge also brings potential threats to model confidentiality, and even further raises other security and privacy issues. In this paper, we present the first comprehensive investigation of the teacher model exposure threat in the transfer learning context, aiming to gain a deeper insight into the tension between public knowledge and model confidentiality. To this end, we propose a teacher model fingerprinting attack to infer the origin of a student model, i.e., the teacher model it transfers from. Specifically, we propose a novel optimization-based method to carefully generate queries to probe the student model to realize our attack. Unlike existing model reverse engineering approaches, our proposed fingerprinting method neither relies on fine-grained model outputs, e.g., posteriors, nor auxiliary information of the model architecture or training dataset. We systematically evaluate the effectiveness of our proposed attack. The empirical results demonstrate that our attack can accurately identify the model origin with few probing queries. Moreover, we show that the proposed attack can serve as a stepping stone to facilitating other attacks against machine learning models, such as model stealing.

ウォーターマーキングプロンプトインジェクションデータ収集

AIHWS

On reverse engineering neural network implementation on GPU

Ł. Chmielewski, L. Weissbart

Published: 2021

Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data

Jacson Rodrigues Correia-Silva, Rodrigo F. Berriel, Claudine Badue, Alberto F. de Souza, Thiago Oliveira-Santos

Published: 2018.6.14

In the past few years, Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems. Many companies employ resources and money to generate these models and provide them as an API, therefore it is in their best interest to protect them, i.e., to avoid that someone else copies them. Recent studies revealed that state-of-the-art CNNs are vulnerable to adversarial examples attacks, and this weakness indicates that CNNs do not need to operate in the problem domain (PD). Therefore, we hypothesize that they also do not need to be trained with examples of the PD in order to operate in it. Given these facts, in this paper, we investigate if a target black-box CNN can be copied by persuading it to confess its knowledge through random non-labeled data. The copy is two-fold: i) the target network is queried with random data and its predictions are used to create a fake dataset with the knowledge of the network; and ii) a copycat network is trained with the fake dataset and should be able to achieve similar performance as the target network. This hypothesis was evaluated locally in three problems (facial expression, object, and crosswalk classification) and against a cloud-based API. In the copy attacks, images from both non-problem domain and PD were used. All copycat networks achieved at least 93.7% of the performance of the original models with non-problem domain data, and at least 98.6% using additional data from the PD. Additionally, the copycat CNN successfully copied at least 97.3% of the performance of the Microsoft Azure Emotion API. Our results show that it is possible to create a copycat CNN by simply querying a target network as black-box with random non-labeled data.

モデルの頑健性保証顔認識システムポイズニング

被引用数 2

Adversarial Model Extraction on Graph Neural Networks

David DeFazio, Arti Ramesh

Published: 2019.12.17

Along with the advent of deep neural networks came various methods of exploitation, such as fooling the classifier or contaminating its training data. Another such attack is known as model extraction, where provided API access to some black box neural network, the adversary extracts the underlying model. This is done by querying the model in such a way that the underlying neural network provides enough information to the adversary to be reconstructed. While several works have achieved impressive results with neural network extraction in the propositional domain, this problem has not yet been considered over the relational domain, where data samples are no longer considered to be independent and identically distributed (iid). Graph Neural Networks (GNNs) are a popular deep learning framework to perform machine learning tasks over relational data. In this work, we formalize an instance of GNN extraction, present a solution with preliminary results, and discuss our assumptions and future directions.

GNN モデル設計機械学習の基礎

Stealing neural networks via timing side channels

Vasisht Duddu, Debasis Samanta, D. Vijay Rao, Valentina Emilia Balas

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Published: 2016

Stealing Links from Graph Neural Networks

Xinlei He, Jinyuan Jia, Michael Backes, Neil Zhenqiang Gong, Yang Zhang

Published: 2020.5.5

Graph data, such as chemical networks and social networks, may be deemed confidential/private because the data owner often spends lots of resources collecting the data or the data contains sensitive information, e.g., social relationships. Recently, neural networks were extended to graph data, which are known as graph neural networks (GNNs). Due to their superior performance, GNNs have many applications, such as healthcare analytics, recommender systems, and fraud detection. In this work, we propose the first attacks to steal a graph from the outputs of a GNN model that is trained on the graph. Specifically, given a black-box access to a GNN model, our attacks can infer whether there exists a link between any pair of nodes in the graph used to train the model. We call our attacks link stealing attacks. We propose a threat model to systematically characterize an adversary's background knowledge along three dimensions which in total leads to a comprehensive taxonomy of 8 different link stealing attacks. We propose multiple novel methods to realize these 8 attacks. Extensive experiments on 8 real-world datasets show that our attacks are effective at stealing links, e.g., AUC (area under the ROC curve) is above 0.95 in multiple cases. Our results indicate that the outputs of a GNN model reveal rich information about the structure of the graph used to train the model.

モデルインバージョンリンク予測手法攻撃の評価

Security analysis of deep neural networks operating in the presence of cache side-channel attacks

Sanghyun Hong, Michael Davinroy, Yigitcan Kaya, Stuart Nevans Locke, Ian Rackow, Kevin Kulda, Dana Dachman-Soled, Tudor Dumitras

ASPLOS ’20: Architectural Support for Programming Languages and Operating Systems

Deepsniffer: A DNN model extraction framework based on learning architectural hints

Xing Hu, Ling Liang, Shuangchen Li, Lei Deng, Pengfei Zuo, Yu Ji, Xinfeng Xie, Yufei Ding, Chang Liu, Timothy Sherwood, Yuan Xie

Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size

Proceedings of the 55th Annual Design Automation Conference

Reverse engineering convolutional neural networks through side-channel information leaks

W. Hua, Z. Zhang, G. E. Suh

Published: 2018

CVPR

Densely connected convolutional networks

G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger

Published: 2017

arXiv

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer

Published: 2016

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation

Sanjay Kariyappa, Atul Prakash, Moinuddin Qureshi

Published: 2020.5.7

Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities. This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack. Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.

攻撃手法アルゴリズム最適化手法

8th International Conference on Learning Representations, ICLR 2020

Thieves on sesame street! model extraction of bert-based apis

Kalpesh Krishna, Gaurav Singh Tomar, Ankur P. Parikh, Nicolas Papernot, Mohit Iyyer

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton

Published: 2009

NeurIPS

Imagenet classification with deep convolutional neural networks

Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton

Published: 2012

Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks

Isabell Lederer, Rudolf Mayer, Andreas Rauber

Published: 2023.4.22

The commercial use of Machine Learning (ML) is spreading; at the same time, ML models are becoming more complex and more expensive to train, which makes Intellectual Property Protection (IPP) of trained models a pressing issue. Unlike other domains that can build on a solid understanding of the threats, attacks and defenses available to protect their IP, the ML-related research in this regard is still very fragmented. This is also due to a missing unified view as well as a common taxonomy of these aspects. In this paper, we systematize our findings on IPP in ML, while focusing on threats and attacks identified and defenses proposed at the time of writing. We develop a comprehensive threat model for IP in ML, categorizing attacks and defenses within a unified and consolidated taxonomy, thus bridging research from both the ML and security communities.

メンバーシップ推論ウォーターマーキング DNN IP保護手法

IEEE International Conference on Data Mining (ICDM)

Query-Efficient Black-Box Attack by Active Learning

Pengcheng Li, Jinfeng Yi, Lijun Zhang

Published: 2018.9.13

Deep neural network (DNN) as a popular machine learning model is found to be vulnerable to adversarial attack. This attack constructs adversarial examples by adding small perturbations to the raw input, while appearing unmodified to human eyes but will be misclassified by a well-trained classifier. In this paper, we focus on the black-box attack setting where attackers have almost no access to the underlying models. To conduct black-box attack, a popular approach aims to train a substitute model based on the information queried from the target DNN. The substitute model can then be attacked using existing white-box attack approaches, and the generated adversarial examples will be used to attack the target DNN. Despite its encouraging results, this approach suffers from poor query efficiency, i.e., attackers usually needs to query a huge amount of times to collect enough information for training an accurate substitute model. To this end, we first utilize state-of-the-art white-box attack methods to generate samples for querying, and then introduce an active learning strategy to significantly reduce the number of queries needed. Besides, we also propose a diversity criterion to avoid the sampling bias. Our extensive experimental results on MNIST and CIFAR-10 show that the proposed method can reduce more than $90\%$ of queries while preserve attacking success rates and obtain an accurate substitute model which is more than $85\%$ similar with the target oracle.

クエリ生成手法モデルの頑健性保証敵対的攻撃

arXiv

On the feasibility of specialized ability stealing for large language code models

Zongjie Li, Chaozheng Wang, Pingchuan Ma, Chaowei Liu, Shuai Wang, Daoyuan Wu, Cuiyun Gao

Published: 2023

ICLR

Delving into transferable adversarial examples and black-box attacks

Liu, Y., Chen, X., Liu, C., Song, D.

Published: 2017

CCSW’20, Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop

GANRED: gan-based reverse engineering of dnns via cache side-channel

Yuntao Liu, Ankur Srivastava

Swin transformer V2: scaling up capacity and resolution

Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo

Computer Vision - ECCV 2018 - 15th European Conference

Shufflenet V2: practical guidelines for efficient CNN architecture design

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun

IEEE International Joint Conference on Neural Network (IJCNN)

Stealing Knowledge from Protected Deep Neural Networks Using Composite Unlabeled Data

Itay Mosafi, Eli David, Nathan S. Netanyahu

Published: 2019.12.9

As state-of-the-art deep neural networks are deployed at the core of more advanced Al-based products and services, the incentive for copying them (i.e., their intellectual properties) by rival adversaries is expected to increase considerably over time. The best way to extract or steal knowledge from such networks is by querying them using a large dataset of random samples and recording their output, followed by training a student network to mimic these outputs, without making any assumption about the original networks. The most effective way to protect against such a mimicking attack is to provide only the classification result, without confidence values associated with the softmax layer.In this paper, we present a novel method for generating composite images for attacking a mentor neural network using a student model. Our method assumes no information regarding the mentor's training dataset, architecture, or weights. Further assuming no information regarding the mentor's softmax output values, our method successfully mimics the given neural network and steals all of its knowledge. We also demonstrate that our student network (which copies the mentor) is impervious to watermarking protection methods, and thus would not be detected as a stolen model.Our results imply, essentially, that all current neural networks are vulnerable to mimicking attacks, even if they do not divulge anything but the most basic required output, and that the student model which mimics them cannot be easily detected and singled out as a stolen copy using currently available techniques.

敵対的サンプル深層学習手法 DNN IP保護手法

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018

Rendered insecure: GPU side channel attacks are practical

Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, Nael B. Abu-Ghazaleh

被引用数 3

I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences

Daryna Oliynyk, Rudolf Mayer, Andreas Rauber

Published: 2022.6.17

Machine Learning-as-a-Service (MLaaS) has become a widespread paradigm, making even the most complex machine learning models available for clients via e.g. a pay-per-query principle. This allows users to avoid time-consuming processes of data collection, hyperparameter tuning, and model training. However, by giving their customers access to the (predictions of their) models, MLaaS providers endanger their intellectual property, such as sensitive training data, optimised hyperparameters, or learned model parameters. Adversaries can create a copy of the model with (almost) identical behavior using the the prediction labels only. While many variants of this attack have been described, only scattered defence strategies have been proposed, addressing isolated threats. This raises the necessity for a thorough systematisation of the field of model stealing, to arrive at a comprehensive understanding why these attacks are successful, and how they could be holistically defended against. We address this by categorising and comparing model stealing attacks, assessing their performance, and exploring corresponding defence techniques in different settings. We propose a taxonomy for attack and defence approaches, and provide guidelines on how to select the right attack or defence strategy based on the goal and available resources. Finally, we analyse which defences are rendered less effective by current attack strategies.

ポイズニング敵対的攻撃手法メンバーシップ推論

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Knockoff Nets: Stealing Functionality of Black-Box Models

Tribhuvanesh Orekondy, Bernt Schiele, Mario Fritz

Published: 2018.12.7

Machine Learning (ML) models are increasingly deployed in the wild to perform a wide range of tasks. In this work, we ask to what extent can an adversary steal functionality of such "victim" models based solely on blackbox interactions: image in, predictions out. In contrast to prior work, we present an adversary lacking knowledge of train/test data used by the model, its internals, and semantics over model outputs. We formulate model functionality stealing as a two-step approach: (i) querying a set of input images to the blackbox model to obtain predictions; and (ii) training a "knockoff" with queried image-prediction pairs. We make multiple remarkable observations: (a) querying random images from a different distribution than that of the blackbox training data results in a well-performing knockoff; (b) this is possible even when the knockoff is represented using a different architecture; and (c) our reinforcement learning approach additionally improves query sample efficiency in certain settings and provides performance gains. We validate model functionality stealing on a range of datasets and tasks, as well as on a popular image analysis API where we create a reasonable knockoff for as little as $30.

モデル抽出攻撃医療画像分析強化学習

The Thirty-Fourth AAAI Conference on Artificial Intelligence

Activethief: Model extraction using active learning and unannotated public data

S. Pal, Y. Gupta, A. Shukla, A. Kanade, S. K. Shevade, V. Ganapathy

Published: 2020

ACM AsiACCS

Practical black-box attacks against machine learning

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., Swami, A.

Published: 2017

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga

Published: 2019

7th IEEE European Symposium on Security and Privacy, EuroS&P 2022

DNN model architecture fingerprinting attack on CPU-GPU edge devices

Kartik Patwari, Syed Mahbub Hafiz, Han Wang, Houman Homayoun, Zubair Shafiq, Chen-Nee Chuah

Published: 2022

Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories

Adnan Siraj Rakin, Md Hafizul Islam Chowdhuryy, Fan Yao, Deliang Fan

CAAI Trans. Intell. Technol.

Protecting artificial intelligence ips: a survey of watermarking and fingerprinting for machine learning

Francesco Regazzoni, Paolo Palmieri, Fethulah Smailbegovic, Rosario Cammarota, Ilia Polian

14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015

Mlaas: Machine learning as a service

Mauro Ribeiro, Katarina Grolinger, Miriam A. M. Capretz

Published: 2015

Model Weight Theft With Just Noise Inputs: The Curious Case of the Petulant Attacker

Nicholas Roberts, Vinay Uday Prabhu, Matthew McAteer

Published: 2019.12.19

This paper explores the scenarios under which an attacker can claim that 'Noise and access to the softmax layer of the model is all you need' to steal the weights of a convolutional neural network whose architecture is already known. We were able to achieve 96% test accuracy using the stolen MNIST model and 82% accuracy using the stolen KMNIST model learned using only i.i.d. Bernoulli noise inputs. We posit that this theft-susceptibility of the weights is indicative of the complexity of the dataset and propose a new metric that captures the same. The goal of this dissemination is to not just showcase how far knowing the architecture can take you in terms of model stealing, but to also draw attention to this rather idiosyncratic weight learnability aspects of CNNs spurred by i.i.d. noise input. We also disseminate some initial results obtained with using the Ising probability distribution in lieu of the i.i.d. Bernoulli distribution.

モデル設計データ生成モデル通信

International Conference on Machine Learning

Reverse-engineering deep relu networks

David Rolnick, et al.

Published: 2020

CVPR

Mobilenetv2: Inverted residuals and linear bottlenecks

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen

Published: 2018

Model Stealing Attacks Against Inductive Graph Neural Networks

SP

Yun Shen, Xinlei He, Yufei Han, Yang Zhang

Published: 2021.12.16

Many real-world data come in the form of graphs. Graph neural networks (GNNs), a new family of machine learning (ML) models, have been proposed to fully leverage graph data to build powerful applications. In particular, the inductive GNNs, which can generalize to unseen data, become mainstream in this direction. Machine learning models have shown great potential in various tasks and have been deployed in many real-world scenarios. To train a good model, a large amount of data as well as computational resources are needed, leading to valuable intellectual property. Previous research has shown that ML models are prone to model stealing attacks, which aim to steal the functionality of the target models. However, most of them focus on the models trained with images and texts. On the other hand, little attention has been paid to models trained with graph data, i.e., GNNs. In this paper, we fill the gap by proposing the first model stealing attacks against inductive GNNs. We systematically define the threat model and propose six attacks based on the adversary's background knowledge and the responses of the target models. Our evaluation on six benchmark datasets shows that the proposed model stealing attacks against GNNs achieve promising performance.

敵対的訓練グラフ表現学習機械学習手法

3rd International Conference on Learning Representations

Very deep convolutional networks for large-scale image recognition

K. Simonyan, A. Zisserman

Published: 2015

IEEE CVPR

Going deeper with convolutions

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich

Published: 2015

IEEE CVPR

Rethinking the inception architecture for computer vision

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna

Published: 2016

DAWN: Dynamic Adversarial Watermarking of Neural Networks

ACM Multimedia

Sebastian Szyller, Buse Gul Atli, Samuel Marchal, N. Asokan

Published: 2019.6.3

Training machine learning (ML) models is expensive in terms of computational power, amounts of labeled data and human expertise. Thus, ML models constitute intellectual property (IP) and business value for their owners. Embedding digital watermarks during model training allows a model owner to later identify their models in case of theft or misuse. However, model functionality can also be stolen via model extraction, where an adversary trains a surrogate model using results returned from a prediction API of the original model. Recent work has shown that model extraction is a realistic threat. Existing watermarking schemes are ineffective against IP theft via model extraction since it is the adversary who trains the surrogate model. In this paper, we introduce DAWN (Dynamic Adversarial Watermarking of Neural Networks), the first approach to use watermarking to deter model extraction IP theft. Unlike prior watermarking schemes, DAWN does not impose changes to the training process but it operates at the prediction API of the protected model, by dynamically changing the responses for a small subset of queries (e.g., <0.5%) from API clients. This set is a watermark that will be embedded in case a client uses its queries to train a surrogate model. We show that DAWN is resilient against two state-of-the-art model extraction attacks, effectively watermarking all extracted surrogate models, allowing model owners to reliably demonstrate ownership (with confidence $>1- 2^{-64}$), incurring negligible loss of prediction accuracy (0.03-0.5%).

メンバーシップ推論透かし技術敵対的サンプル

IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019

Mnasnet: Platform-aware neural architecture search for mobile

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le

Published: 2019

50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2020

Leaky DNN: stealing deep-learning model secret with GPU context-switching side-channel

Junyi Wei, Yicheng Zhang, Zhe Zhou, Zhou Li, Mohammad Abdullah Al Faruque

Proceedings of the 34th Annual Computer Security Applications Conference, ACSAC 2018

I know what you see: Power side-channel attack on convolutional neural network accelerators

Lingxiao Wei, Bo Luo, Yu Li, Yannan Liu, Qiang Xu

Ezclone: Improving DNN model extraction attack via shape distillation from GPU execution profiles

Jonah O’Brien Weiss, Tiago A. O. Alves, Sandip Kundu

Published: 2023

arXiv preprint

Huggingface’s transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz

Published: 2019

2021 International Conference on Electronics, Information, and Communication (ICEIC)

Time to leak: Cross-device timing attack on edge deep learning accelerator

Yoo-Seung Won, Soham Chatterjee, Dirmanto Jap, Shivam Bhasin, Arindam Basu

Published: 2021

ACM Asia Conference on Computer and Communications Security (AsiaCCS)

被引用数 2

Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization

Bang Wu, Xiangwen Yang, Shirui Pan, Xingliang Yuan

Published: 2020.10.24

Machine learning models are shown to face a severe threat from Model Extraction Attacks, where a well-trained private model owned by a service provider can be stolen by an attacker pretending as a client. Unfortunately, prior works focus on the models trained over the Euclidean space, e.g., images and texts, while how to extract a GNN model that contains a graph structure and node features is yet to be explored. In this paper, for the first time, we comprehensively investigate and develop model extraction attacks against GNN models. We first systematically formalise the threat modelling in the context of GNN model extraction and classify the adversarial threats into seven categories by considering different background knowledge of the attacker, e.g., attributes and/or neighbour connections of the nodes obtained by the attacker. Then we present detailed methods which utilise the accessible knowledge in each threat to implement the attacks. By evaluating over three real-world datasets, our attacks are shown to extract duplicated models effectively, i.e., 84% - 89% of the inputs in the target domain have the same output predictions as the victim model.

攻撃手法攻撃の分類知識グラフ

Visual transformers: Token-based image representation and processing for computer vision

Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

Published: 2020

2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Aggregated residual transformations for deep neural networks

Saining Xie, Ross B. Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He

Published: 2017

Activeguard: An active DNN IP protection technique via adversarial examples

Mingfu Xue, Shichang Sun, Can He, Yushu Zhang, Jian Wang, Weiqiang Liu

Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures

Mengjia Yan, Christopher Fletcher, Josep Torrellas

Published: 2018.8.15

Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN's architecture (i.e., its hyper-parameters) broadly determines the DNN's accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN's architecture facilitates other, existing DNN attacks. This paper presents Cache Telepathy: a fast and accurate mechanism to steal a DNN's architecture using the cache side channel. Our attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel. This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the architectures by very substantially reducing the search space of target DNN architectures. For example, for VGG using OpenBLAS, it reduces the search space from more than $10^{35}$ architectures to just 16.

モデル抽出攻撃の検知ハイパーパラメータ調整モデル抽出攻撃

IEEE International Symposium on Circuits and Systems, ISCAS 2020

Model reverse-engineering attack using correlation power analysis against systolic array based neural network accelerator

K. Yoshida, T. Kubota, S. Okura, M. Shiozaki, T. Fujino

2020 IEEE International Symposium on Hardware Oriented Security and Trust, HOST 2020

Deepem: Deep neural networks model recovery through EM side-channel information leakage

Honggang Yu, Haocheng Ma, Kaichen Yang, Yiqiang Zhao, Yier Jin

IEEE Trans. Emerg. Top. Comput. Intell.

ES attack: Model stealing against deep neural networks without data hurdles

Xiaoyong Yuan, Leah Ding, Lan Zhang, Xiaolin Li, Dapeng Oliver Wu

Published: 2022

Proceedings of the British Machine Vision Conference 2016, BMVC 2016

Wide residual networks

Sergey Zagoruyko, Nikos Komodakis

Published: 2016

Comput. Commun.

AFA: adversarial fingerprinting authentication for deep neural networks

Jingjing Zhao, Qingyue Hu, Gaoyang Liu, Xiaoqiang Ma, Fei Chen, Mohammad Mehedi Hassan