Releasing full data records is one of the most challenging problems in data
privacy. On the one hand, many of the popular techniques such as data
de-identification are problematic because of their dependence on the background
knowledge of adversaries. On the other hand, rigorous methods such as the
exponential mechanism for differential privacy are often computationally
impractical to use for releasing high dimensional data or cannot preserve high
utility of original data due to their extensive data perturbation.
This paper presents a criterion called plausible deniability that provides a
formal privacy guarantee, notably for releasing sensitive datasets: an output
record can be released only if a certain amount of input records are
indistinguishable, up to a privacy parameter. This notion does not depend on
the background knowledge of an adversary. Also, it can efficiently be checked
by privacy tests. We present mechanisms to generate synthetic datasets with
similar statistical properties to the input data and the same format. We study
this technique both theoretically and experimentally. A key theoretical result
shows that, with proper randomization, the plausible deniability mechanism
generates differentially private synthetic data. We demonstrate the efficiency
of this generative technique on a large dataset; it is shown to preserve the
utility of original data with respect to various statistical analysis and
machine learning measures.