On the Privacy Properties of Variants on the Sparse Vector Technique

TOP 文献データベース On the Privacy Properties of Variants on the Sparse Vector Technique

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1508.07306

PDF

https://arxiv.org/pdf/1508.07306

文献情報

作者: Yan Chen,Ashwin Machanavajjhala
公開日: 2015-8-29
所属機関: Duke University
所属の国: United States of America
会議名

AIにより推定されたラベル

差分プライバシープライバシーリスク管理プライバシー保護メカニズム

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

The sparse vector technique is a powerful differentially private primitive that allows an analyst to check whether queries in a stream are greater or lesser than a threshold. This technique has a unique property -- the algorithm works by adding noise with a finite variance to the queries and the threshold, and guarantees privacy that only degrades with (a) the maximum sensitivity of any one query in stream, and (b) the number of positive answers output by the algorithm. Recent work has developed variants of this algorithm, which we call {\em generalized private threshold testing}, and are claimed to have privacy guarantees that do not depend on the number of positive or negative answers output by the algorithm. These algorithms result in a significant improvement in utility over the sparse vector technique for a given privacy budget, and have found applications in frequent itemset mining, feature selection in machine learning and generating synthetic data. In this paper we critically analyze the privacy properties of generalized private threshold testing. We show that generalized private threshold testing does not satisfy \epsilon-differential privacy for any finite \epsilon. We identify a subtle error in the privacy analysis of this technique in prior work. Moreover, we show an adversary can use generalized private threshold testing to recover counts from the datasets (especially small counts) exactly with high accuracy, and thus can result in individuals being reidentified. We demonstrate our attacks empirically on real datasets.