Differentially Private Ordinary Least Squares

TOP Literature Database Differentially Private Ordinary Least Squares

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/1507.02482

PDF

https://arxiv.org/pdf/1507.02482

Paper Information

Author: Or Sheffet
Published: 7-9-2015
Updated: 8-22-2017
Affiliation: Computing Science Dept., University of Alberta, Edmonton
Country: Canada
Conference: J. Priv. Confidentiality

Labels Estimated by AI

Differential Privacy Privacy Loss Analysis Matrix Multiplication Methods

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Linear regression is one of the most prevalent techniques in machine learning, however, it is also common to use linear regression for its \emph{explanatory} capabilities rather than label prediction. Ordinary Least Squares (OLS) is often used in statistics to establish a correlation between an attribute (e.g. gender) and a label (e.g. income) in the presence of other (potentially correlated) features. OLS assumes a particular model that randomly generates the data, and derives \emph{$t$-values} --- representing the likelihood of each real value to be the true correlation. Using $t$-values, OLS can release a \emph{confidence interval}, which is an interval on the reals that is likely to contain the true correlation, and when this interval does not intersect the origin, we can \emph{reject the null hypothesis} as it is likely that the true correlation is non-zero. Our work aims at achieving similar guarantees on data under differentially private estimators. First, we show that for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives a very good approximation of $t$-values, secondly, when JLT approximates Ridge regression (linear regression with $l_2$-regularization) we derive, under certain conditions, confidence intervals using the projected data, lastly, we derive, under different conditions, confidence intervals for the "Analyze Gauss" algorithm (Dwork et al, STOC 2014).

External Datasets

diabetes dataset