Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

TOP 文献データベース Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2308.15709

PDF

https://arxiv.org/pdf/2308.15709

文献情報

作者: Jiachen T. Wang;Yuqing Zhu;Yu-Xiang Wang;Ruoxi Jia;Prateek Mittal
公開日: 2023-8-30
更新日: 2023-11-26
所属機関: Princeton University
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

プライバシー保護手法計算効率データ生成

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data.