AIセキュリティポータル K Program
Machine Unlearning in Large Language Models
Share
Abstract
Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper introduces a novel machine unlearning framework into LLMs. Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses, while retaining their standard output capabilities. To accomplish this, we use an evaluative model to pinpoint dialogues needing unlearning. We also establish a distance loss to function as the model's negative loss, diverting it from previous undesirable outputs. Furthermore, we determine the expected output's cluster mean to formulate a positive loss, directing the model's outputs toward preferable outcomes without compromising its reasoning abilities and performance. Experimental results show that our approach effectively meets unlearning objectives without substantially compromising model performance.
Stealthy and flexible trojan in deep learning framework
Y. Wang, K. Chen, Y. Tan, S. Huang, W. Ma, Y. Li
Published: 2023
Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models
Jiashu Xu, Mingyu Derek Ma, Fei Wang, Chaowei Xiao, Muhao Chen
Published: 2023
Backdoor attacks against distributed swarm learning
K. Chen, H. Zhang, X. Feng, X. Zhang, B. Mi, Z. Jin
Published: 2023
Model architecture level privacy leakage in neural networks
Y. Li, H. Yan, T. Huang, Z. Pan, J. Lai, X. Zhang, K. Chen, J. Li
Published: 2024
Revisiting the transferability of adversarial examples via source-agnostic adversarial feature inducing method
Y. Xiao, J. Zhou, K. Chen, Z. Liu
Published: 2023
Adversarial Demonstration Attacks on Large Language Models
Jiongxiao Wang, Zichen Liu, Keun Hee Park, Zhuojun Jiang, Zhaoheng Zheng, Zhuofeng Wu, Muhao Chen, Chaowei Xiao
Published: 5.24.2023
Chatgpt: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope
P. P. Ray
Prompt Injection attack against LLM-integrated Applications
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu
Published: 6.9.2023
Quantum computational advantage with a programmable photonic processor
L. S. Madsen, F. Laudenbach, M. F. Askarani, F. Rortais, T. Vincent, J. F. Bulmer, F. M. Miatto, L. Neuhaus, L. G. Helt, M. J. Collins
Published: 2022
Data governance in the age of large-scale data-driven language technology
Y. Jernite, H. Nguyen, S. Biderman, A. Rogers, M. Masoud, V. Danchev, S. Tan, A. S. Luccioni, N. Subramani, I. Johnson
Published: 2022
Pazopanib in locally advanced or metastatic renal cell carcinoma: results of a randomized phase iii trial
C. N. Sternberg, I. D. Davis, J. Mardiak, C. Szczylik, E. Lee, J. Wagstaff, C. H. Barrios, P. Salman, O. A. Gladkov, A. Kavina
Deep k-NN Defense against Clean-label Data Poisoning Attacks
Neehar Peri, Neal Gupta, W. Ronny Huang, Liam Fowl, Chen Zhu, Soheil Feizi, Tom Goldstein, John P. Dickerson
Published: 9.30.2019
Institutional analysis with the institutional grammar
S. Siddiki, T. Heikkila, C. M. Weible, R. Pacheco-Vega, D. Carter, C. Curley, A. Deslatte, A. Bennett
Published: 2022
Does the gdpr enhance consumers’ control over personal data? an analysis from a behavioural perspective
I. Van Ooijen, H. U. Vrabec
Published: 2019
The commodification of personal data and the road to consumer autonomy through the ccpa
B. Rose
Published: 2020
Fast observations of frb 20220912a: burst properties and polarization characteristics
Y.-K. Zhang, D. Li, B. Zhang, S. Cao, Y. Feng, W.-Y. Wang, Y. Qu, J.-R. Niu, W.-W. Zhu, J.-L. Han
Published: 2023
Extending detection and response: How mxdr evolves cybersecurity
A. S. George, S. Sagayarajan, T. Baskar, A. H. George
Published: 2023
Privacy preserving machine unlearning for smart cities
K. Chen, Y. Huang, Y. Wang, X. Zhang, B. Mi, Y. Wang
Published: 2024
Fast and accurate snn model strengthening for industrial applications
D. Zhou, W. Chen, K. Chen, B. Mi
Published: 2023
Summary of chatgpt-related research and perspective towards the future of large language models
Y. Liu, T. Han, S. Ma, J. Zhang, Y. Yang, J. Tian, H. He, A. Li, M. He, Z. Liu
Published: 2023
Ntire 2023 challenge on image super-resolution (x4): Methods and results
Y. Zhang, K. Zhang, Z. Chen, Y. Li, R. Timofte, J. Zhang, K. Zhang, R. Peng, Y. Ma, L. Jia
Published: 2023
Halueval: A large-scale hallucination evaluation benchmark for large language models
J. Li, X. Cheng, W. X. Zhao, J.-Y. Nie, J.-R. Wen
Published: 2023
Share