These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Linear regression is one of the most prevalent techniques in machine
learning, however, it is also common to use linear regression for its
\emph{explanatory} capabilities rather than label prediction. Ordinary Least
Squares (OLS) is often used in statistics to establish a correlation between an
attribute (e.g. gender) and a label (e.g. income) in the presence of other
(potentially correlated) features. OLS assumes a particular model that randomly
generates the data, and derives \emph{$t$-values} --- representing the
likelihood of each real value to be the true correlation. Using $t$-values, OLS
can release a \emph{confidence interval}, which is an interval on the reals
that is likely to contain the true correlation, and when this interval does not
intersect the origin, we can \emph{reject the null hypothesis} as it is likely
that the true correlation is non-zero. Our work aims at achieving similar
guarantees on data under differentially private estimators. First, we show that
for well-spread data, the Gaussian Johnson-Lindenstrauss Transform (JLT) gives
a very good approximation of $t$-values, secondly, when JLT approximates Ridge
regression (linear regression with $l_2$-regularization) we derive, under
certain conditions, confidence intervals using the projected data, lastly, we
derive, under different conditions, confidence intervals for the "Analyze
Gauss" algorithm (Dwork et al, STOC 2014).