With the development of Big Data and cloud data sharing, privacy preserving
data publishing becomes one of the most important topics in the past decade. As
one of the most influential privacy definitions, differential privacy provides
a rigorous and provable privacy guarantee for data publishing. Differentially
private interactive publishing achieves good performance in many applications;
however, the curator has to release a large number of queries in a batch or a
synthetic dataset in the Big Data era. To provide accurate non-interactive
publishing results in the constraint of differential privacy, two challenges
need to be tackled: one is how to decrease the correlation between large sets
of queries, while the other is how to predict on fresh queries. Neither is easy
to solve by the traditional differential privacy mechanism. This paper
transfers the data publishing problem to a machine learning problem, in which
queries are considered as training samples and a prediction model will be
released rather than query results or synthetic datasets. When the model is
published, it can be used to answer current submitted queries and predict
results for fresh queries from the public. Compared with the traditional
method, the proposed prediction model enhances the accuracy of query results
for non-interactive publishing. Experimental results show that the proposed
solution outperforms traditional differential privacy in terms of Mean Absolute
Value on a large group of queries. This also suggests the learning model can
successfully retain the utility of published queries while preserving privacy.