Systematic Assessment of Tabular Data Synthesis Algorithms

TOP Literature Database Systematic Assessment of Tabular Data Synthesis Algorithms

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2402.06806

PDF

https://arxiv.org/pdf/2402.06806

Paper Information

Author: Yuntao Du;Ninghui Li
Published: 2-10-2024
Updated: 4-13-2024
Affiliation: Purdue University
Country: United States of America
Conference

Labels Estimated by AI

Data Privacy Assessment Data Generation Privacy Protection Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Data synthesis has been advocated as an important approach for utilizing data while protecting data privacy. A large number of tabular data synthesis algorithms (which we call synthesizers) have been proposed. Some synthesizers satisfy Differential Privacy, while others aim to provide privacy in a heuristic fashion. A comprehensive understanding of the strengths and weaknesses of these synthesizers remains elusive due to drawbacks in evaluation metrics and missing head-to-head comparisons of newly developed synthesizers that take advantage of diffusion models and large language models with state-of-the-art marginal-based synthesizers. In this paper, we present a systematic evaluation framework for assessing tabular data synthesis algorithms. Specifically, we examine and critique existing evaluation metrics, and introduce a set of new metrics in terms of fidelity, privacy, and utility to address their limitations. Based on the proposed metrics, we also devise a unified objective for tuning, which can consistently improve the quality of synthetic data for all methods. We conducted extensive evaluations of 8 different types of synthesizers on 12 real-world datasets and identified some interesting findings, which offer new directions for privacy-preserving data synthesis.

External Datasets

Adult

Shoppers

Phishing

Magic

Faults

Bean

Obesity

Robot

Abalone

News

Insurance

Wine