Fragments to Facts: Partial-Information Fragment Inference from LLMs

TOP Literature Database Fragments to Facts: Partial-Information Fragment Inference from LLMs

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2505.13819

PDF

https://arxiv.org/pdf/2505.13819

Paper Information

Author: Lucas Rosenblatt,Bin Han,Robert Wolfe,Bill Howe
Published: 5-20-2025
Affiliation: New York University
Country: United States of America
Conference: International Conference on Machine Learning (ICML)

Labels Estimated by AI

Privacy Leakage Prompt leaking Threats of Medical AI

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large language models (LLMs) can leak sensitive training data through memorization and membership inference attacks. Prior work has primarily focused on strong adversarial assumptions, including attacker access to entire samples or long, ordered prefixes, leaving open the question of how vulnerable LLMs are when adversaries have only partial, unordered sample information. For example, if an attacker knows a patient has "hypertension," under what conditions can they query a model fine-tuned on patient data to learn the patient also has "osteoarthritis?" In this paper, we introduce a more general threat model under this weaker assumption and show that fine-tuned LLMs are susceptible to these fragment-specific extraction attacks. To systematically investigate these attacks, we propose two data-blind methods: (1) a likelihood ratio attack inspired by methods from membership inference, and (2) a novel approach, PRISM, which regularizes the ratio by leveraging an external prior. Using examples from both medical and legal settings, we show that both methods are competitive with a data-aware baseline classifier that assumes access to labeled in-distribution data, underscoring their robustness.

External Datasets

MTS-Dialog

Free Law