These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large language models have gained widespread attention recently, but their
potential security vulnerabilities, especially privacy leakage, are also
becoming apparent. To test and evaluate for data extraction risks in LLM, we
proposed CoSPED, short for Consistent Soft Prompt targeted data Extraction and
Defense. We introduce several innovative components, including Dynamic Loss,
Additive Loss, Common Loss, and Self Consistency Decoding Strategy, and tested
to enhance the consistency of the soft prompt tuning process. Through extensive
experimentation with various combinations, we achieved an extraction rate of
65.2% at a 50-token prefix comparison. Our comparisons of CoSPED with other
reference works confirm our superior extraction rates. We evaluate CoSPED on
more scenarios, achieving Pythia model extraction rate of 51.7% and introducing
cross-model comparison. Finally, we explore defense through Rank-One Model
Editing and achieve a reduction in the extraction rate to 1.6%, which proves
that our analysis of extraction mechanisms can directly inform effective
mitigation strategies against soft prompt-based attacks.