OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

TOP Literature Database OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2505.04416

PDF

https://arxiv.org/pdf/2505.04416

Paper Information

Author: Xiaoyu Xu,Minxin Du,Qingqing Ye,Haibo Hu
Published: 5-7-2025
Updated: 9-9-2025
Affiliation: The Hong Kong Polytechnic University
Country: Hong Kong
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Model DoS Token Identification Method Performance Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose \textbf{OBLIVIATE}, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three components -- masking, distillation, and world fact. Using low-rank adapters (LoRA) ensures efficiency without compromising unlearning quality. We conduct experiments on multiple datasets, including Harry Potter series, WMDP, and TOFU, using a comprehensive suite of metrics: \emph{forget quality} (via a new document-level memorization score), \emph{model utility}, and \emph{fluency}. Results demonstrate its effectiveness in resisting membership inference attacks, minimizing the impact on retained data, and maintaining robustness across diverse scenarios.

External Datasets

Harry Potter

WMDP

TOFU