AIセキュリティポータル K Program
Systematic Assessment of Tabular Data Synthesis Algorithms
Share
Abstract
Data synthesis has been advocated as an important approach for utilizing data while protecting data privacy. A large number of tabular data synthesis algorithms (which we call synthesizers) have been proposed. Some synthesizers satisfy Differential Privacy, while others aim to provide privacy in a heuristic fashion. A comprehensive understanding of the strengths and weaknesses of these synthesizers remains elusive due to drawbacks in evaluation metrics and missing head-to-head comparisons of newly developed synthesizers that take advantage of diffusion models and large language models with state-of-the-art marginal-based synthesizers. In this paper, we present a systematic evaluation framework for assessing tabular data synthesis algorithms. Specifically, we examine and critique existing evaluation metrics, and introduce a set of new metrics in terms of fidelity, privacy, and utility to address their limitations. Based on the proposed metrics, we also devise a unified objective for tuning, which can consistently improve the quality of synthetic data for all methods. We conducted extensive evaluations of 8 different types of synthesizers on 12 real-world datasets and identified some interesting findings, which offer new directions for privacy-preserving data synthesis.
Optuna: A next-generation hyperparameter optimization framework
T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama
Published: 2019
How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models
Ahmed Alaa, Boris Van Breugel, Evgeny S Saveliev, Mihaela van der Schaar
Published: 2022
The creation and use of the SIPP Synthetic
Gary Benedetto, Martha Stinson, John M Abowd
Published: 2018
Sliced and radon wasserstein barycenters of measures
Nicolas Bonneel, Julien Rabin, Gabriel Peyré, Hanspeter Pfister
Published: 2015
Stability and generalization
Olivier Bousquet, André Elisseeff
Published: 2002
Membership inference attacks from first principles
Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer
Published: 2022
Share