On Reconstructing Training Data From Bayesian Posteriors and Trained Models

Authors: George Wynne
Published: 2025-07-24

Source: https://arxiv.org/abs/2507.18372

Labels Predicted by AI

Adversarial Learning Watermark Evaluation Reconstruction Attack

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.