Differentially Private Multi-Site Treatment Effect Estimation

TOP Literature Database Differentially Private Multi-Site Treatment Effect Estimation

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2310.06237

PDF

https://arxiv.org/pdf/2310.06237

Paper Information

Author: Tatsuki Koga;Kamalika Chaudhuri;David Page
Published: 10-10-2023
Affiliation: Dept. of Computer Science and Engineering, University of California, San Diego
Country: United States of America
Conference: Conference on Secure and Trustworthy Machine Learning (SaTML)

Labels Estimated by AI

Information Hiding Techniques Privacy Classification Performance Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Patient privacy is a major barrier to healthcare AI. For confidentiality reasons, most patient data remains in silo in separate hospitals, preventing the design of data-driven healthcare AI systems that need large volumes of patient data to make effective decisions. A solution to this is collective learning across multiple sites through federated learning with differential privacy. However, literature in this space typically focuses on differentially private statistical estimation and machine learning, which is different from the causal inference-related problems that arise in healthcare. In this work, we take a fresh look at federated learning with a focus on causal inference; specifically, we look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications, and provide a federated analytics approach to enable ATE estimation across multiple sites along with differential privacy (DP) guarantees at each site. The main challenge comes from site heterogeneity -- different sites have different sample sizes and privacy budgets. We address this through a class of per-site estimation algorithms that reports the ATE estimate and its variance as a quality measure, and an aggregation algorithm on the server side that minimizes the overall variance of the final ATE estimate. Our experiments on real and synthetic data show that our method reliably aggregates private statistics across sites and provides better privacy-utility tradeoff under site heterogeneity than baselines.

External Datasets

International Stroke Trial (IST)

Tennessee’s Student Teacher Achievement Ratio (STAR)

Infant Health and Development Program (IHDP)

Lalonde

Synth