These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large language models (LLMs) trained over extensive corpora risk memorizing
sensitive, copyrighted, or toxic content. To address this, we propose
\textbf{OBLIVIATE}, a robust unlearning framework that removes targeted data
while preserving model utility. The framework follows a structured process:
extracting target tokens, building retain sets, and fine-tuning with a tailored
loss function comprising three components -- masking, distillation, and world
fact. Using low-rank adapters (LoRA) ensures efficiency without compromising
unlearning quality. We conduct experiments on multiple datasets, including
Harry Potter series, WMDP, and TOFU, using a comprehensive suite of metrics:
\emph{forget quality} (via a new document-level memorization score),
\emph{model utility}, and \emph{fluency}. Results demonstrate its effectiveness
in resisting membership inference attacks, minimizing the impact on retained
data, and maintaining robustness across diverse scenarios.