SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks

TOP Literature Database SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2506.10424

PDF

https://arxiv.org/pdf/2506.10424

Paper Information

Author: Kaiyuan Zhang,Siyuan Cheng,Hanxi Guo,Yuetian Chen,Zian Su,Shengwei An,Yuntao Du,Charles Fleming,Ashish Kundu,Xiangyu Zhang,Ninghui Li
Published: 6-12-2025
Affiliation: Purdue University
Country: United States of America
Conference

Labels Estimated by AI

Prompt Injection Privacy Protection Method Prompt leaking

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large language models (LLMs) have achieved remarkable success and are widely adopted for diverse applications. However, fine-tuning these models often involves private or sensitive information, raising critical privacy concerns. In this work, we conduct the first comprehensive study evaluating the vulnerability of fine-tuned LLMs to membership inference attacks (MIAs). Our empirical analysis demonstrates that MIAs exploit the loss reduction during fine-tuning, making them highly effective in revealing membership information. These findings motivate the development of our defense. We propose SOFT (\textbf{S}elective data \textbf{O}bfuscation in LLM \textbf{F}ine-\textbf{T}uning), a novel defense technique that mitigates privacy leakage by leveraging influential data selection with an adjustable parameter to balance utility preservation and privacy protection. Our extensive experiments span six diverse domains and multiple LLM architectures and scales. Results show that SOFT effectively reduces privacy risks while maintaining competitive model performance, offering a practical and scalable solution to safeguard sensitive information in fine-tuned LLMs.

External Datasets

Pile

ArXiv

Wikipedia

GitHub

HackerNews

PubMed

Pile CC