The deployment of large language models (LLMs) like ChatGPT and Gemini has
shown their powerful natural language generation capabilities. However, these
models can inadvertently learn and retain sensitive information and harmful
content during training, raising significant ethical and legal concerns. To
address these issues, machine unlearning has been introduced as a potential
solution. While existing unlearning methods take into account the specific
characteristics of LLMs, they often suffer from high computational demands,
limited applicability, or the risk of catastrophic forgetting. To address these
limitations, we propose a lightweight behavioral unlearning framework based on
Retrieval-Augmented Generation (RAG) technology. By modifying the external
knowledge base of RAG, we simulate the effects of forgetting without directly
interacting with the unlearned LLM. We approach the construction of unlearned
knowledge as a constrained optimization problem, deriving two key components
that underpin the effectiveness of RAG-based unlearning. This RAG-based
approach is particularly effective for closed-source LLMs, where existing
unlearning methods often fail. We evaluate our framework through extensive
experiments on both open-source and closed-source models, including ChatGPT,
Gemini, Llama-2-7b-chat, and PaLM 2. The results demonstrate that our approach
meets five key unlearning criteria: effectiveness, universality, harmlessness,
simplicity, and robustness. Meanwhile, this approach can extend to multimodal
large language models and LLM-based agents.