CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

TOP Literature Database CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2510.11137

PDF

https://arxiv.org/pdf/2510.11137

Paper Information

Author: Yang Zhuochen,Fok Kar Wai,Thing Vrizlynn
Published: 10-13-2025
Affiliation: Cybersecurity Strategic Technology Centre
Country: Singapore
Conference

Labels Estimated by AI

Defense Mechanism Privacy Enhancing Technology Improvement of Learning

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large language models have gained widespread attention recently, but their potential security vulnerabilities, especially privacy leakage, are also becoming apparent. To test and evaluate for data extraction risks in LLM, we proposed CoSPED, short for Consistent Soft Prompt targeted data Extraction and Defense. We introduce several innovative components, including Dynamic Loss, Additive Loss, Common Loss, and Self Consistency Decoding Strategy, and tested to enhance the consistency of the soft prompt tuning process. Through extensive experimentation with various combinations, we achieved an extraction rate of 65.2% at a 50-token prefix comparison. Our comparisons of CoSPED with other reference works confirm our superior extraction rates. We evaluate CoSPED on more scenarios, achieving Pythia model extraction rate of 51.7% and introducing cross-model comparison. Finally, we explore defense through Rank-One Model Editing and achieve a reduction in the extraction rate to 1.6%, which proves that our analysis of extraction mechanisms can directly inform effective mitigation strategies against soft prompt-based attacks.

External Datasets

Dataset D1

Dataset D2

The Pile