These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
There has been increasing demand for establishing privacy-preserving
methodologies for modern statistics and machine learning. Differential privacy,
a mathematical notion from computer science, is a rising tool offering robust
privacy guarantees. Recent work focuses primarily on developing differentially
private versions of individual statistical and machine learning tasks, with
nontrivial upstream pre-processing typically not incorporated. An important
example is when record linkage is done prior to downstream modeling. Record
linkage refers to the statistical task of linking two or more data sets of the
same group of entities without a unique identifier. This probabilistic
procedure brings additional uncertainty to the subsequent task. In this paper,
we present two differentially private algorithms for linear regression with
linked data. In particular, we propose a noisy gradient method and a sufficient
statistics perturbation approach for the estimation of regression coefficients.
We investigate the privacy-accuracy tradeoff by providing finite-sample error
bounds for the estimators, which allows us to understand the relative
contributions of linkage error, estimation error, and the cost of privacy. The
variances of the estimators are also discussed. We demonstrate the performance
of the proposed algorithms through simulations and an application to synthetic
data.
External Datasets
Freely Extensible Biomedical Record Linkage (Febrl)