How much does a machine learning algorithm leak about its training data, and
why? Membership inference attacks are used as an auditing tool to quantify this
leakage. In this paper, we present a comprehensive \textit{hypothesis testing
framework} that enables us not only to formally express the prior work in a
consistent way, but also to design new membership inference attacks that use
reference models to achieve a significantly higher power (true positive rate)
for any (false positive rate) error. More importantly, we explain \textit{why}
different attacks perform differently. We present a template for
indistinguishability games, and provide an interpretation of attack success
rate across different instances of the game. We discuss various uncertainties
of attackers that arise from the formulation of the problem, and show how our
approach tries to minimize the attack uncertainty to the one bit secret about
the presence or absence of a data point in the training set. We perform a
\textit{differential analysis} between all types of attacks, explain the gap
between them, and show what causes data points to be vulnerable to an attack
(as the reasons vary due to different granularities of memorization, from
overfitting to conditional memorization). Our auditing framework is openly
accessible as part of the \textit{Privacy Meter} software tool.