These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Clustering problems (such as $k$-means and $k$-median) are fundamental
unsupervised machine learning primitives, and streaming clustering algorithms
have been extensively studied in the past. However, since data privacy becomes
a central concern in many real-world applications, non-private clustering
algorithms may not be as applicable in many scenarios.
In this work, we provide the first differentially private algorithms for
$k$-means and $k$-median clustering of $d$-dimensional Euclidean data points
over a stream with length at most $T$ using space that is sublinear (in $T$) in
the continual release setting where the algorithm is required to output a
clustering at every timestep.
We achieve (1) an $O(1)$-multiplicative approximation with $\tilde{O}(k^{1.5}
\cdot poly(d,\log(T)))$ space and $poly(k,d,\log(T))$ additive error, or (2) a
$(1+\gamma)$-multiplicative approximation with
$\tilde{O}_\gamma(poly(k,2^{O_\gamma(d)},\log(T)))$ space for any $\gamma>0$,
and the additive error is $poly(k,2^{O_\gamma(d)},\log(T))$. Our main technical
contribution is a differentially private clustering framework for data streams
which only requires an offline DP coreset or clustering algorithm as a
blackbox.