Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

arxiv

Digital Forgetting in Large Language Models: A Survey of Unlearning Methods

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2404.02062

PDF

https://arxiv.org/pdf/2404.02062

文献情報

作者: Alberto Blanco-Justicia;Najeeb Jebreel;Benet Manzanares;David Sánchez;Josep Domingo-Ferrer;Guillem Collell;Kuan Eeik Tan
公開日: 2024-4-3
所属機関: Universitat Rovira i Virgili
所属の国: Spain
会議名: Artif. Intell. Rev.

AIにより推定されたラベル

プロンプトインジェクション機械学習の忘却 LLM性能評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

The objective of digital forgetting is, given a model with undesirable knowledge or behavior, obtain a new model where the detected issues are no longer present. The motivations for forgetting include privacy protection, copyright protection, elimination of biases and discrimination, and prevention of harmful content generation. Effective digital forgetting has to be effective (meaning how well the new model has forgotten the undesired knowledge/behavior), retain the performance of the original model on the desirable tasks, and be scalable (in particular forgetting has to be more efficient than retraining from scratch on just the tasks/data to be retained). This survey focuses on forgetting in large language models (LLMs). We first provide background on LLMs, including their components, the types of LLMs, and their usual training pipeline. Second, we describe the motivations, types, and desired properties of digital forgetting. Third, we introduce the approaches to digital forgetting in LLMs, among which unlearning methodologies stand out as the state of the art. Fourth, we provide a detailed taxonomy of machine unlearning methods for LLMs, and we survey and compare current approaches. Fifth, we detail datasets, models and metrics used for the evaluation of forgetting, retaining and runtime. Sixth, we discuss challenges in the area. Finally, we provide some concluding remarks.

外部データセット

PKU-SafeRLHF

HaluEval

RealToxicityPrompts

Civil Comments

Harry Potter Universe

WinoBias

WinoMT

CrowS Pairs

Winogender Schemas

Bias in Bios

StereoSet

English Universal Dependencies

RedPajama

Training Data Extraction Challenge 5

Pile

Harry Potter and the Sorcerer’s Stone

SST2

Amazon polarity

Yelp polarity

IMDB

SAMSum

LEDGAR

PersonaChat

IWSLT14

Enron emails

zsRE

CounterFact

CodeParrot GitHub Code

参考文献

Advances in Neural Information Processing Systems (NIPS)

Attention is All You Need

Vaswani, A., Sharding, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Kudlur, M., Murray, I., Engelhardt, B., Vaswani, A.

Published: 2017

Efficient estimation of word representations in vector space

T. Mikolov, K. Chen, G. Corrado, J. Dean

Published: 2013

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Learning phrase representations using rnn encoder–decoder for statistical machine translation

Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Published: 2014

Advances in Neural Information Processing Systems 27

Sequence to sequence learning with neural networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le

Published: 2014

Neural machine translation by jointly learning to align and translate

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Published: 2014

Pretrained Transformers as Universal Computation Engines

Xiong, C., Merity, S., So, B.

Published: 2020

Proceedings of NAACL-HLT

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Published: 2019

Improving language understanding by generative pre-training

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever

Published: 2018

Llama: Open and efficient foundation language models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar

Published: 2023

Journal of machine learning research

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J Liu

Published: 2020