Differentially Private Linear Regression with Linked Data

TOP Literature Database Differentially Private Linear Regression with Linked Data

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2308.00836

PDF

https://arxiv.org/pdf/2308.00836

Paper Information

Author: Shurong Lin;Elliot Paquette;Eric D. Kolaczyk
Published: 8-2-2023
Updated: 5-8-2024
Affiliation: Department of Mathematics and Statistics, Boston University
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Privacy Protection Method Secure Logistic Regression Data Generation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

There has been increasing demand for establishing privacy-preserving methodologies for modern statistics and machine learning. Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses primarily on developing differentially private versions of individual statistical and machine learning tasks, with nontrivial upstream pre-processing typically not incorporated. An important example is when record linkage is done prior to downstream modeling. Record linkage refers to the statistical task of linking two or more data sets of the same group of entities without a unique identifier. This probabilistic procedure brings additional uncertainty to the subsequent task. In this paper, we present two differentially private algorithms for linear regression with linked data. In particular, we propose a noisy gradient method and a sufficient statistics perturbation approach for the estimation of regression coefficients. We investigate the privacy-accuracy tradeoff by providing finite-sample error bounds for the estimators, which allows us to understand the relative contributions of linkage error, estimation error, and the cost of privacy. The variances of the estimators are also discussed. We demonstrate the performance of the proposed algorithms through simulations and an application to synthetic data.

External Datasets

Freely Extensible Biomedical Record Linkage (Febrl)

Survey on Household Income and Wealth (SHIW)