AIセキュリティポータル K Program
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
Share
Abstract
It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
Pythia: A suite for analyzing large language models across training and scaling
Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal
Published: 2023
Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow
Sid Black, Leo Gao, Phil Wang, Connor Leahy, Stella Biderman
Published: 2021
When the Curious Abandon Honesty: Federated Learning Is Not Private
Franziska Boenisch, Adam Dziedzic, Roei Schuster, Ali Shahin Shamsabadi, Ilia Shumailov, Nicolas Papernot
Published: 2021.12.6
Language models are few-shot learners
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei
Published: 2020
Membership inference attacks from first principles
Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., Tramer, F.
Published: 2022
Reproducible scaling laws for contrastive language-image learning
Cherti, M., Beaumont, R., Wightman, R., Wortsman, M., Ilharco, G., Gordon, C., Schuhmann, C., Schmidt, L., Jitsev, J.
Published: 2023
Label-Only Membership Inference Attacks
Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot
Published: 2020.7.29
Imagenet: A large-scale hierarchical image database
J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei
Published: 2009
Qlora: Efficient finetuning of quantized llms
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.
Published: 2023
Privacy backdoors: Stealing data with corrupted pretrained models
Feng, S., Tramer, F.
Published: 2024
Handcrafted backdoors in deep neural networks
S. Hong, N. Carlini, A. Kurakin
Published: 2022
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez
Published: 2024.1.11
Mimic-iii, a freely accessible critical care database
Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L.-w. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Anthony Celi, L., Mark, R. G.
Published: 2016
Mimic-iv, a freely accessible electronic health record dataset
Johnson, A. E., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T. J., Hao, S., Moody, B., Gow, B.
Published: 2023
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton
Published: 2009
Decoupled weight decay regularization
Ilya Loshchilov, Frank Hutter
Published: 2018
Language models are unsupervised multitask learners
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever
Published: 2019
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
Published: 2021
Enhanced Membership Inference Attacks against Machine Learning Models
Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri
Published: 2021.11.18
Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial
Wang, G., Liu, X., Ying, Z., Yang, G., Chen, Z., Liu, Z., Zhang, M., Yan, H., Lu, Y., Gao, Y.
Published: 2023
Robust fine-tuning of zero-shot models
Wortsman, M., Ilharco, G., Kim, J. W., Li, M., Kornblith, S., Roelofs, R., Lopes, R. G., Hajishirzi, H., Farhadi, A., Namkoong, H.
Published: 2022
Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha
Published: 2017.9.6
See through gradients: Image batch recovery via gradinversion
Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M Alvarez, Jan Kautz, Pavlo Molchanov
Published: 2021
Share