AI Model Disgorgement: Methods and Choices | AI Security Portal

JA

JA

EN

TOP Literature Database AI Model Disgorgement: Methods and Choices

arxiv

AI Model Disgorgement: Methods and Choices

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2304.03545

PDF

https://arxiv.org/pdf/2304.03545

Paper Information

Author: Alessandro Achille;Michael Kearns;Carson Klingenberg;Stefano Soatto
Published: 4-7-2023
Affiliation: AWS AI
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

DNN IP Protection Method Data Generation Watermarking

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement -- the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.

External Datasets

web-scale data

fine-grained visual classification tasks

References

Where is the information in a deep neural network?

Alessandro Achille, Giovanni Paolini, Stefano Soatto

Published: 2019

On binding objects to symbols: Learning physical concepts to understand real from fake

Alessandro Achille, Stefano Soatto

Published: 2022

Proceedings of the AAAI Conference on Artificial Intelligence

Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff

Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, Eric Horvitz

Published: 2019

Influence functions in deep learning are fragile

Samyadeep Basu, Philip Pope, Soheil Feizi

Published: 2020

2021 IEEE Symposium on Security and Privacy (SP)

Machine unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot

Published: 2021

Proc. of STOC

When is memorization of irrelevant training data necessary for high-accuracy learning?

Gavin Brown, Mark Bun, Vitaly Feldman, Adam Smith, Kunal Talwar

Published: 2021

2015 IEEE Symposium on Security and Privacy (SP)

Towards making systems forget with machine unlearning

Y. Cao, J. Yang

Published: 2015

Journal of the Royal Statistical Society Series B: Statistical Methodology

Gaussian differential privacy

Jinshuo Dong, Aaron Roth, Weijie J Su

Published: 2022

Introspective cross-attention probing for lightweight transfer of pre-trained models

Yonatan Dukler, Alessandro Achille, Hao Yang, Varsha Vivek, Luca Zancato, Ben Bowman, Avinash Ravichandran, Charless Fowlkes, Ashwin Swaminathan, Stefano Soatto

Published: 2023

preprint

Safe: Machine unlearning with shard graphs

Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto

Published: 2023

STOC

Does learning require memorization? a short tale about a long tail

Vitaly Feldman

Published: 2020

Advances in neural information processing systems

Making ai forget you: Data deletion in machine learning

Antonio Ginart, Melody Guan, Gregory Valiant, James Y Zou

Published: 2019

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Mixed-privacy forgetting in deep networks

A. Golatkar, A. Achille, A. Ravichandran, M. Polito, S. Soatto

Published: 2021

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks

Aditya Golatkar, Alessandro Achille, Stefano Soatto

Published: 11.13.2019

We explore the problem of selectively forgetting a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gleaned by probing deep into its weights. We propose a method for "scrubbing'" the weights clean of information about a particular set of training data. The method does not require retraining from scratch, nor access to the data originally used for training. Instead, the weights are modified so that any probing function of the weights is indistinguishable from the same function applied to the weights of a network trained without the data to be forgotten. This condition is a generalized and weaker form of Differential Privacy. Exploiting ideas related to the stability of stochastic gradient descent, we introduce an upper-bound on the amount of information remaining in the weights, which can be estimated efficiently even for deep neural networks.

Machine learning Information Security Trigger Detection

European Conference on Computer Vision (ECCV)

Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

Aditya Golatkar, Alessandro Achille, Stefano Soatto

Published: 3.6.2020

We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.

Deep Learning Method Machine learning Information Hiding Techniques

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Mixed differential privacy in computer vision

Aditya Golatkar, Alessandro Achille, Yu-Xiang Wang, Aaron Roth, Michael Kearns, Stefano Soatto

Published: 2022

Estimating informativeness of samples with smooth unique information

Hrayr Harutyunyan, Alessandro Achille, Giovanni Paolini, Orchid Majumder, Avinash Ravichandran, Rahul Bhotika, Stefano Soatto

Published: 2021

Advances in neural information processing systems

GANS trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter

Published: 2017

First IEEE Conference on Secure and Trustworthy Machine Learning

No matter how you slice it: Machine unlearning with sisa comes at the expense of minority classes

Korbinian Koch, Marcus Soll

International Conference on Machine Learning (ICML)

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang

Published: 3.15.2017

How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.

Improvement of Learning Poisoning Explainability Evaluation

Privacy adhering machine un-learning in nlp

Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth

Published: 2022

Proceedings of the IEEE conference on computer vision and pattern recognition

Universal adversarial perturbations

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard

Published: 2017

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

Published: 2021

Vision and attention

Change blindness: Implications for the nature of visual attention

Ronald A Rensink

Published: 2001

Proceedings of the USENIX Security Symposium

Glaze: Protecting artists from style mimicry by text-to-image models

Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, Ben Y. Zhao

Published: 2023

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Towards backward-compatible representation learning

Yantao Shen, Yuanjun Xiong, Wei Xia, Stefano Soatto

Published: 2020

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

An empirical analysis of backward compatibility in machine learning systems

Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz

Published: 2020

IEEE transactions on image processing

Image quality assessment: from error visibility to structural similarity

Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli

Published: 2004

2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

Unsupervised deep learning for just noticeable difference estimation

Yuhao Wu, Weiping Ji, Jinjian Wu

Published: 2020

Advances in neural information processing systems

Unsupervised data augmentation for consistency training

Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, Quoc Le

Published: 2020

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22

Arcane: An efficient architecture for exact machine unlearning

Yan, H., Li, X., Guo, Z., Li, H., Li, F., Lin, X.

Published: 2022

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Positive-congruent training: Towards regression-free model updates

Sijie Yan, Yuanjun Xiong, Kaustav Kundu, Shuo Yang, Siqi Deng, Meng Wang, Wei Xia, Stefano Soatto

Published: 2021

The Eleventh International Conference on Learning Representations

Towards robustness certification against universal perturbations

Yi Zeng, Zhouxing Shi, Ming Jin, Feiyang Kang, Lingjuan Lyu, Cho-Jui Hsieh, Ruoxi Jia

Published: 2023

2020 IEEE symposium series on computational intelligence (SSCI)

Sim-to-real transfer in deep reinforcement learning for robotics: a survey

Wenshuai Zhao, Jorge Peña Queralta, Tomi Westerlund

Published: 2020

Neurocomputing

Exploring user historical semantic and sentiment preference for microblog sentiment classification

Xiaofei Zhu, Jie Wu, Ling Zhu, Jiafeng Guo, Ran Yu, Katarina Boland, Stefan Dietze

Published: 2021