Tabular Generative Models are often argued to preserve privacy by creating
synthetic datasets that resemble training data. However, auditing their
empirical privacy remains challenging, as commonly used similarity metrics fail
to effectively characterize privacy risk. Membership Inference Attacks (MIAs)
have recently emerged as a method for evaluating privacy leakage in synthetic
data, but their practical effectiveness is limited. Numerous attacks exist
across different threat models, each with distinct implementations targeting
various sources of privacy leakage, making them difficult to apply
consistently. Moreover, no single attack consistently outperforms the others,
leading to a routine underestimation of privacy risk.
To address these issues, we propose a unified, model-agnostic threat
framework that deploys a collection of attacks to estimate the maximum
empirical privacy leakage in synthetic datasets. We introduce Synth-MIA, an
open-source Python library that streamlines this auditing process through a
novel testbed that integrates seamlessly into existing synthetic data
evaluation pipelines through a Scikit-Learn-like API. Our software implements
13 attack methods through a Scikit-Learn-like API, designed to enable fast
systematic estimation of privacy leakage for practitioners as well as
facilitate the development of new attacks and experiments for researchers.
We demonstrate our framework's utility in the largest tabular synthesis
privacy benchmark to date, revealing that higher synthetic data quality
corresponds to greater privacy leakage, that similarity-based privacy metrics
show weak correlation with MIA results, and that the differentially private
generator PATEGAN can fail to preserve privacy under such attacks. This
underscores the necessity of MIA-based auditing when designing and deploying
Tabular Generative Models.