AIセキュリティポータル K Program
AI Model Disgorgement: Methods and Choices
Share
Abstract
Responsible use of data is an indispensable part of any machine learning (ML) implementation. ML developers must carefully collect and curate their datasets, and document their provenance. They must also make sure to respect intellectual property rights, preserve individual privacy, and use data in an ethical way. Over the past few years, ML models have significantly increased in size and complexity. These models require a very large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Despite sophisticated controls on training data and a significant amount of effort dedicated to ensuring that training corpora are properly composed, the sheer volume of data required for the models makes it challenging to manually inspect each datum comprising a training corpus. One potential fix for training corpus data defects is model disgorgement -- the elimination of not just the improperly used data, but also the effects of improperly used data on any component of an ML model. Model disgorgement techniques can be used to address a wide range of issues, such as reducing bias or toxicity, increasing fidelity, and ensuring responsible usage of intellectual property. In this paper, we introduce a taxonomy of possible disgorgement methods that are applicable to modern ML systems. In particular, we investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff
Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, Eric Horvitz
Published: 2019
Machine unlearning
Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot
Published: 2021
When is memorization of irrelevant training data necessary for high-accuracy learning?
Gavin Brown, Mark Bun, Vitaly Feldman, Adam Smith, Kunal Talwar
Published: 2021
Towards making systems forget with machine unlearning
Y. Cao, J. Yang
Published: 2015
Gaussian differential privacy
Jinshuo Dong, Aaron Roth, Weijie J Su
Published: 2022
Safe: Machine unlearning with shard graphs
Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan, Stefano Soatto
Published: 2023
Does learning require memorization? a short tale about a long tail
Vitaly Feldman
Published: 2020
Making ai forget you: Data deletion in machine learning
Antonio Ginart, Melody Guan, Gregory Valiant, James Y Zou
Published: 2019
Mixed-privacy forgetting in deep networks
A. Golatkar, A. Achille, A. Ravichandran, M. Polito, S. Soatto
Published: 2021
Mixed differential privacy in computer vision
Aditya Golatkar, Alessandro Achille, Yu-Xiang Wang, Aaron Roth, Michael Kearns, Stefano Soatto
Published: 2022
GANS trained by a two time-scale update rule converge to a local Nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter
Published: 2017
No matter how you slice it: Machine unlearning with sisa comes at the expense of minority classes
Korbinian Koch, Marcus Soll
Universal adversarial perturbations
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard
Published: 2017
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever
Published: 2021
Change blindness: Implications for the nature of visual attention
Ronald A Rensink
Published: 2001
Glaze: Protecting artists from style mimicry by text-to-image models
Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, Ben Y. Zhao
Published: 2023
Towards backward-compatible representation learning
Yantao Shen, Yuanjun Xiong, Wei Xia, Stefano Soatto
Published: 2020
An empirical analysis of backward compatibility in machine learning systems
Megha Srivastava, Besmira Nushi, Ece Kamar, Shital Shah, Eric Horvitz
Published: 2020
Image quality assessment: from error visibility to structural similarity
Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli
Published: 2004
Unsupervised deep learning for just noticeable difference estimation
Yuhao Wu, Weiping Ji, Jinjian Wu
Published: 2020
Unsupervised data augmentation for consistency training
Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, Quoc Le
Published: 2020
Arcane: An efficient architecture for exact machine unlearning
Yan, H., Li, X., Guo, Z., Li, H., Li, F., Lin, X.
Published: 2022
Positive-congruent training: Towards regression-free model updates
Sijie Yan, Yuanjun Xiong, Kaustav Kundu, Shuo Yang, Siqi Deng, Meng Wang, Wei Xia, Stefano Soatto
Published: 2021
Towards robustness certification against universal perturbations
Yi Zeng, Zhouxing Shi, Ming Jin, Feiyang Kang, Lingjuan Lyu, Cho-Jui Hsieh, Ruoxi Jia
Published: 2023
Sim-to-real transfer in deep reinforcement learning for robotics: a survey
Wenshuai Zhao, Jorge Peña Queralta, Tomi Westerlund
Published: 2020
Exploring user historical semantic and sentiment preference for microblog sentiment classification
Xiaofei Zhu, Jie Wu, Ling Zhu, Jiafeng Guo, Ran Yu, Katarina Boland, Stefan Dietze
Published: 2021
Share